Documentation for FineTuningInput¶
Functionality¶
FinetuningInput is a data model that represents a clickstream session. It validates the consistency between results and ranks and provides methods for extracting non-clicked results, mapping part IDs to object IDs, and removing specified results. It is used as input for feature extraction in fine-tuning tasks.
Parameters¶
query
: The user query or search term initiating the session.events
: List of item IDs that received user interactions.results
: List of result item IDs shown to the user.ranks
: Dictionary mapping each result ID to its rank.event_types
: Optional list of event type indicators.timestamp
: Optional session initialization timestamp.is_irrelevant
: Optional boolean indicating session relevance.part_to_object_dict
: Optional mapping from part IDs to object IDs.
Inheritance¶
FineTuningInput inherits from pydantic.BaseModel, which automatically provides data validation and parsing.
Motivation¶
This class standardizes clickstream data for feature extraction. It enforces data consistency and simplifies the further processing of results, which is crucial for effective fine-tuning.
Usage¶
- Purpose: Structure input data for fine-tuning in machine learning workflows that use clickstream and search result data.
Example¶
data = {
"query": "example query",
"events": ["item1", "item2"],
"results": ["item1", "item2", "item3"],
"ranks": {"item1": 1.0, "item2": 2.0, "item3": 3.0}
}
ft_input = FineTuningInput(**data)
print(len(ft_input))
Documentation for FineTuningInput.not_events
¶
Functionality¶
Returns a list of result IDs that did not receive any user interaction. It iterates through the results list and includes any result not found in the events attribute, providing a list of non-event item IDs.
Parameters¶
This property does not take any parameters.
Usage¶
- Purpose: Identify results that did not record any user events.
Example¶
non_events = fine_tuning_input.not_events
Documentation for FineTuningInput.get_object_id
¶
Functionality¶
Maps a part ID to its parent object ID. If a mapping is defined in part_to_object_dict, the corresponding object ID is returned. Otherwise, the original ID is returned.
Parameters¶
id
: (str) The part ID to look up.
Usage¶
Returns the object ID for a given part ID. If a mapping exists, the parent object is used; otherwise, the part ID is returned.
Example¶
Given part_to_object_dict = {'part1': 'obj1'}:
input_obj.get_object_id('part1') # returns 'obj1'
input_obj.get_object_id('test') # returns 'test'
Documentation for FineTuningInput.remove_results
¶
Functionality¶
Removes specified result IDs from the input and updates related structures such as results, events, ranks, and part-to-object mapping.
Parameters¶
ids
: A list or set of IDs to remove from the results.
Usage¶
- Purpose: Remove unwanted result IDs and update all data structures within a FineTuningInput instance.
Example¶
# Create a sample input
input_data = FineTuningInput(
query="search term",
events=["id1", "id2"],
results=["id1", "id2", "id3"],
ranks={"id1": 1.0, "id2": 2.0, "id3": 3.0}
)
# Remove a result
input_data.remove_results(["id2"])
Documentation for FineTuningInput.preprocess_ids
¶
Functionality¶
This method converts a list of input IDs into strings. If an item is a tuple, its first element is converted to string. If it is a PyTorch Tensor, its value is extracted and converted to string. For all other types, the item is converted directly to string.
Parameters¶
cls
: The class being validated (provided automatically by Pydantic).value
: A list of items to be processed into string IDs.
Usage¶
Use this method as a Pydantic validator for the "results" field to ensure that all IDs are properly formatted as strings.
Example¶
For an input list like: [(1, "data"), tensor(3), 4]
, this method returns: ["1", "3", "4"]
.