Documentation for Upsertion Methods¶
handle_failed_items¶
Functionality¶
Handles failed items during the upsertion process by logging the error, appending each failed DataItem along with a traceback, and updating the task status accordingly.
Parameters¶
- failed_items:
List[Tuple[DataItem, str]]
-- Tuples of the failed item and its corresponding traceback. - task:
BaseDataHandlingTask
-- The upsertion task object from the database. - exception:
Exception
-- The exception that occurred during the process. - task_crud:
CRUDBase
-- The CRUD handler for persisting task updates.
Usage¶
Purpose: Manage failures in upsertion by recording error details and propagating critical errors based on configuration.
Example¶
try:
# upsert actions
pass
except Exception as e:
handle_failed_items(failed_items, task, e, task_crud)
upsert_batch¶
Functionality¶
Processes a batch of data items by downloading, splitting, running inference, and uploading vectors. It logs each stage and handles errors by marking tasks as failed when necessary.
Parameters¶
- batch:
List of DataItems
to process. - data_loader:
DataLoader
instance for downloading data. - items_splitter:
ItemSplitter
instance for splitting items. - preprocessor:
Preprocessor
instance to format data. - inference_client:
TritonClient
instance for inference. - collection: Target collection to upload vectors.
- batch_index: Index of the current batch.
- task: Task object representing the upsertion process.
- task_crud:
CRUDBase
instance to update task status.
Usage¶
Process a batch of items using upsert_batch
. The function downloads data, splits and preprocesses content, performs inference, and uploads generated vectors. It provides detailed logging and handles errors gracefully.
Example¶
batch = [...]
data_loader = DataLoader()
items_splitter = ItemSplitter()
preprocessor = ItemsDatasetDictPreprocessor()
inference_client = TritonClient()
collection = Collection()
batch_index = 0
task = get_task()
task_crud = get_task_crud()
upsert_batch(batch, data_loader, items_splitter, preprocessor,
inference_client, collection, batch_index, task, task_crud)
process_upsert¶
Functionality¶
Processes an upsertion task in batches by downloading, preprocessing, splitting, inferring, and uploading vectors. Handles errors at each stage and updates the task status.
Parameters¶
- task: Upsertion task containing items and status info.
- collection: Target collection for vector uploads.
- data_loader: Loader to fetch item details.
- items_splitter: Splits downloaded items into parts.
- preprocessor: Prepares items for inference processing.
- inference_client: Client to run inference on item parts.
- task_crud: CRUD handler to update the task in storage.
Usage¶
Purpose: Process and upsert items in a task by batching operations.
Example¶
process_upsert(
task,
collection,
data_loader,
items_splitter,
preprocessor,
inference_client,
task_crud
)