Documentation for DistBasedSelector
¶
Overview¶
DistBasedSelector is an abstract class that makes selection decisions based on pre-calculated distance metrics. It leverages existing distance information and applies custom decision logic defined in its subclasses. This helps separate raw distance handling from selection policy.
Motivation¶
The motivation behind DistBasedSelector is to provide a reusable framework for distance-based selection. By using pre-computed distances, it enables flexibility in applying different metrics and normalization techniques, thus promoting a clean and modular design.
Inheritance¶
DistBasedSelector extends AbstractSelector and is intended to be subclassed. Subclasses must implement the _calculate_binary_labels
method to define how binary selection is determined based on processed distance values.
Parameters¶
search_index_info
: Contains configuration for the search index.is_similarity
: Boolean flag indicating if values represent similarity.margin
: Threshold margin value for positive selection.softmin_temperature
: Temperature parameter for softmin calculations.scale_to_one
: Whether to normalize distance values to a [0, 1] range.
Usage¶
Subclasses should implement the _calculate_binary_labels
method to convert normalized distances into binary selection labels.
Example Implementation¶
class CustomSelector(DistBasedSelector):
def _calculate_binary_labels(self, corrected_values):
# Example threshold-based selection
return corrected_values > 0.5
Method Documentation¶
DistBasedSelector.vectors_are_needed
¶
Functionality¶
This property indicates whether the selector needs to access the actual embedding vectors. It always returns False because distance-based selectors use pre-calculated distance values only.
Return Value¶
- bool: Always returns False.
Usage¶
Use this property to check if embedding vectors should be loaded. When it returns False, you can bypass vector retrieval and rely solely on distance metrics.
Example¶
selector = DistBasedSelector(search_index_info,
is_similarity=True,
margin=0.2,
softmin_temperature=1.0,
scale_to_one=False)
if not selector.vectors_are_needed:
# Proceed with using precomputed distances
pass
DistBasedSelector._calculate_binary_labels
¶
Functionality¶
This method converts corrected distance values into binary labels. Each element in the input tensor is classified as selected (1) if it meets a specific threshold, or not selected (0) otherwise. It is meant to be customized in subclass implementations.
Parameters¶
corrected_values
: A torch.Tensor of distance values that have been adjusted by the selection margin. These values determine the binary outcome based on a decision boundary.
Returns¶
- A torch.Tensor containing binary labels, where 1 indicates a selected item and 0 indicates a non-selected item.
Usage¶
Subclasses of DistBasedSelector should implement this method to define their own selection criteria. Below is an example implementation:
Example¶
# Example implementation in a subclass
def _calculate_binary_labels(self, corrected_values: torch.Tensor) -> torch.Tensor:
# Apply simple threshold-based selection
return corrected_values > 0
DistBasedSelector._convert_values
¶
Functionality¶
This method extracts raw distance values from a list of objects and normalizes them into a tensor based on the metric type and similarity mode. It adjusts the values by inverting dot products, transforming cosine similarities, or taking the reciprocal for Euclidean distances.
Parameters¶
categories
: A list of objects with adistance
attribute. Each object encapsulates a distance metric.
Returns¶
- A
torch.Tensor
of normalized distance values.
Usage¶
- Purpose - Transform raw distance metrics to a normalized tensor for further selection logic.
Example¶
Assuming a list of objects with a distance
field:
objects = [obj1, obj2, obj3]
tensor = selector._convert_values(objects)
Ensure the selector is configured with the correct metric type for accurate normalization.
DistBasedSelector.select
¶
Functionality¶
Selects indices of objects from a list based on their normalized distance values, adjust via a margin, and generate binary labels. Indices corresponding to a "1" label are returned.
Parameters¶
categories
: List of objects having adistance
attribute. The distance is normalized and adjusted with a margin.query_vector
: Optional tensor. Not used in the selection process, provided for interface compatibility.
Usage¶
- Purpose: Select indices of objects based on pre-calculated distances.
Example¶
# Example usage:
selector = YourDistBasedSelector(search_index_info, is_similarity=True)
indices = selector.select(categories)