Skip to content

Documentation for text_transforms

Functionality

The text_transforms function applies a transformation to each text entry in a Dataset column. It updates the Dataset with a new column containing the transformed text, aiding in preprocessing for embedding models.

Parameters

  • examples: A Dataset containing the input text.
  • transform: A callable to process the text; defaults to no change.
  • raw_text_field_name: The original column name (default "item").
  • text_values_name: The name for the new transformed text column (default "text").

Usage

Preprocess text data by applying a transformation before model training.

Example

from datasets import load_dataset
from embedding_studio.embeddings.data.transforms.text.transforms import text_transforms

data = load_dataset("sample_dataset")
transformed_data = text_transforms(data, transform=str.lower)