gcp_text_loader
Documentation for GCPTextLoader
¶
Functionality¶
GCPTextLoader loads and decodes text files stored in Google Cloud Platform's Cloud Storage. It reads byte streams and converts them into strings using a specified encoding. The method _get_item
specifically extracts text from a given io.BytesIO
file object and decodes it using the loader's specified encoding, ensuring that the text content is properly read from cloud storage.
Inheritance¶
This class inherits from GCPDataLoader, which handles GCP credentials, connection, and retry logic. It builds on GCPDataLoader's functionality to specialize in text processing.
Motivation¶
The main purpose of GCPTextLoader is to abstract the complexity of accessing and reading text files from GCP Cloud Storage. It integrates retry strategies and custom decoding to simplify text extraction.
Parameters¶
retry_config
: Optional retry strategy configuration.features
: Optional schema defining the expected data.encoding
: Character encoding for the text (default: "utf-8").kwargs
: Additional arguments for GCP credentials.file
: Anio.BytesIO
object containing the text data to be decoded.
Usage¶
- Purpose: Load and decode text files stored in GCP and retrieve the text data from files stored in Google Cloud Platform's Cloud Storage.
Example¶
from io import BytesIO
from embedding_studio.data_storage.loaders.cloud_storage.gcp.gcp_text_loader import GCPTextLoader
# Initialize loader with optional retry configuration
loader = GCPTextLoader(retry_config=None, encoding="utf-8")
# Suppose 'file' is an io.BytesIO object from GCP Storage
data = loader._get_item(BytesIO(b"example text"))
print(data)