e5

Documentation for TextToTextE5TritonClient and TextToTextE5TritonClientFactory¶

TextToTextE5TritonClient¶

Functionality¶

TextToTextE5TritonClient is a specialized extension of TritonClient designed specifically for text-to-text inference using the E5 model. It tokenizes text inputs using Hugging Face transformers and prepares them for Triton Server inference requests.

Parameters¶

url: The URL of the Triton Inference Server.
plugin_name: The name of the plugin/model for inference tasks.
embedding_model_id: Identifier for the deployed model.
tokenizer: A Hugging Face tokenizer (PreTrainedTokenizer or PreTrainedTokenizerFast).
preprocessor: Optional function to preprocess text data.
model_name: Name of the model used for inference (default is "intfloat/multilingual-e5-large").
retry_config: Optional retry policy configuration.

Usage¶

TextToTextE5TritonClient converts text queries into tokenized tensor formats required by the Triton Server. It inherits core functionality from TritonClient and extends it to support text-to-text processing.

Example¶

A basic usage example:

client = TextToTextE5TritonClient(
    url="http://localhost:8000",
    plugin_name="text_to_text",
    embedding_model_id="e5-model",
    tokenizer=my_tokenizer,
)
inputs = client._prepare_query("sample query")

Method: _prepare_query¶

Functionality¶

Prepares a query string for the Triton Inference Server by using a tokenizer to generate input tensors. The method processes the input query and converts it into a list of InferInput objects for inference.

Parameters¶

query: A string containing the text to be tokenized and sent to the Triton Inference Server.

Usage¶

Purpose: Transform a query text into the Triton-supported input format by tokenizing, padding, and converting data types.

Example¶

client = TextToTextE5TritonClient(url, plugin_name, embed_id, tokenizer)
infer_input = client._prepare_query("Hello world")
# infer_input is a list of InferInput objects ready for processing.

Method: _prepare_items¶

Functionality¶

This method tokenizes each text input provided in a list. An optional preprocessor is applied to each entry before tokenization. The tokenizer converts the texts into tensors that are then transformed into numpy arrays and wrapped into Triton InferInput objects for further processing.

Parameters¶

data: A list of text data (strings or dictionaries). Each element is preprocessed (if a preprocessor exists), tokenized, and converted into an InferInput object.

Usage¶

Purpose: To prepare multiple text inputs for inference in the Triton server by converting raw text into a numerical format.

Example¶

Given an input list such as:

data = ["Hello world", {"text": "Sample input"}]

The method processes each entry and returns a list of InferInput objects ready for inference.

TextToTextE5TritonClientFactory¶

Functionality¶

This class serves as a factory to create and configure instances of TextToTextE5TritonClient for text-to-text inference tasks using an E5 model. It centralizes parameters like URL, plugin name, preprocessor function, model name, and retry configuration.

Motivation¶

The design aims to avoid redundancy by offering a unified interface for client creation. This ensures consistent setups across deployments and simplifies client instantiation.

Inheritance¶

TextToTextE5TritonClientFactory inherits from TritonClientFactory, leveraging common configuration functionalities while providing a specialized setup for E5 text-to-text inference.

Method: get_client¶

Functionality¶

Creates an instance of TextToTextE5TritonClient with common configuration. It sets the Triton server URL, plugin name, preprocessor, tokenizer, and retry settings. This method simplifies client creation for text-to-text inference tasks.

Parameters¶

embedding_model_id: The deployed model ID as a string.
**kwargs: Additional keyword arguments to pass to the client constructor.

Usage¶

Purpose: To instantiate a client for text-to-text inference using Triton.

Example¶

client = factory.get_client("model_id_123", custom_param=value)