Skip to content

extractor

Documentation for FeaturesExtractor Class

Functionality

The FeaturesExtractor class extracts fine-tuning features from data inputs. It computes positive and negative example ranks, calculates event confidences, and aggregates clicks and ranks. This results in robust feature representations for training models.

Motivation

This class leverages multiple signals like ranking, clicks, and event confidences to create a unified feature set that aids in fine-tuning and improves model performance.

Inheritance

FeaturesExtractor inherits from pytorch_lightning.LightningModule, which integrates it with the PyTorch Lightning framework and offers a standardized training workflow.


Documentation for FeaturesExtractor._confidences

Functionality

Calculates confidences for a given fine-tuning input by aggregating ranks and clicks, and then computing a confidence score using the provided confidence_calculator. It separates the scores into positive and negative confidences based on group participation in events.

Parameters

  • fine_tuning_input: An instance of FineTuningInput containing the items, their ranks, and event information.
  • not_events: A list of IDs representing negative examples used for rank prediction.

Usage

  • Purpose: To compute and return the positive and negative confidence scores for the provided fine-tuning input.

Example

pos_conf, neg_conf = extractor._confidences(fine_tuning_input, not_events)

Documentation for FeaturesExtractor._downsample_not_events

Functionality

This method downsamples non-event items from the input data. It groups non-event identifiers by their respective group IDs using fine_tuning_input.get_object_id and then randomly selects a subset of groups based on the negative_downsampling_factor. All not-event items from the selected groups are returned.

Parameters

  • fine_tuning_input (FineTuningInput): Input data object that contains non-event identifiers in not_events and provides a method to get group IDs via get_object_id.

Returns

  • List[str]: A list of downsampled non-event identifiers collected from the selected groups.

Usage

  • Purpose: Balance non-event items by reducing their count while maintaining group structure.

Example

downsampled = features_extractor._downsample_not_events(fine_tuning_input)
# downsampled holds a list of non-event item IDs from randomly selected groups

Documentation for FeaturesExtractor._get_fine_tuning_features

Functionality

This method computes features for a fine-tuning input. It processes positive and negative events while preserving group boundaries. It downsamples negative examples, computes ranking scores and confidences, and prepares tensors for training.

Parameters

  • fine_tuning_input: An object containing the query and event IDs.
  • dataset: An ItemsSet with items corresponding to the events.

Returns

  • FineTuningFeatures: An object holding positive and negative ranks along with their confidence values.

Usage

Use this method to generate feature tensors for training fine-tuned models. It aggregates vectors, computes similarity or distance measures, and assigns ranking scores with corresponding confidences.

Example

features = extractor._get_fine_tuning_features(input, dataset)

Documentation for FeaturesExtractor._get_paired_inputs_features

Functionality

Computes fine-tuning features for a pair of inputs: a not-irrelevant and an irrelevant input. It calculates features for each input, merges them, and applies adjustments for fine-tuning.

Parameters

  • not_irrelevant_input: FineTuningInput with relevant events.
  • irrelevant_input: FineTuningInput representing an irrelevant input.
  • dataset: ItemsSet containing items related to the inputs.

Usage

  • Purpose: Merges features from two inputs for fine-tuning tasks.

Example

Assume input1 and input2 are FineTuningInput objects and dataset is an ItemsSet:

features = extractor._get_paired_inputs_features(input1, input2, dataset)


Documentation for FeaturesExtractor.forward

Functionality

Calculates fine-tuning features for a batch of fine-tuning input pairs. It iterates over not-irrelevant and irrelevant inputs and aggregates the features based on events and confidences. It applies filters and returns a FineTuningFeatures object.

Parameters

  • batch: A list of tuples, each containing two FineTuningInput objects.
  • dataset: ItemsSet of items corresponding to clickstream inputs.

Usage

  • Purpose: Computes features for fine-tuning models by processing input pairs and aggregating results using auxiliary methods.

Example

features = extractor.forward(batch, dataset)