Skip to content

Documentation for PgsqlJSONBLoader

Class Overview

The PgsqlJSONBLoader class is designed to load and process JSONB data from a PostgreSQL database. It extracts the JSONB content, with the option to filter specific fields if requested.

Parameters

  • connection_string: PostgreSQL connection string.
  • query_generator: Class to generate PostgreSQL queries.
  • jsonb_column: The column name where JSONB data is stored (default is 'jsonb_data').
  • fields_to_keep: Optional list or set of fields to extract from the JSONB data.
  • retry_config: Optional configuration for retrying database operations.
  • features: Expected features for the dataset.
  • **kwargs: Additional keyword arguments.

Usage

  • Purpose: To provide a flexible and reliable way to load and filter JSONB data retrieved from PostgreSQL.
  • Motivation: Facilitate the conversion of JSONB data into a standard Python dict for further processing and analysis.
  • Inheritance: Inherits from PgsqlDataLoader, reusing its connection handling and query generating functionality.

Example

Below is an example of how to instantiate and use the loader:

loader = PgsqlJSONBLoader(
    connection_string="postgresql://user:pass@host/db",
    query_generator=YourQueryGenerator,
    jsonb_column="jsonb_data",
    fields_to_keep=["field1", "field2"]
)

data = loader.load_data(query_parameters)

Method: PgsqlJSONBLoader._get_item

Functionality

The _get_item method extracts and filters JSONB data from a given dictionary. If fields_to_keep is defined, only the specified keys are returned.

Parameters

  • data: A dictionary containing a single row's data. It must include the JSONB data under the column defined by the loader (typically 'jsonb_data').

Returns

  • A dictionary with the parsed JSONB data, filtered if fields_to_keep is provided.

Raises

  • ValueError: Raised if JSONB data is missing or not a dictionary.

Usage

  • Purpose: To extract JSONB data from a database row and optionally filter it based on specified fields.

Example

data = {
    "jsonb_data": {"id": 1, "name": "example", "age": 30}
}
loader = PgsqlJSONBLoader(connection_string,
                          query_generator,
                          fields_to_keep=["id", "name"])
item = loader._get_item(data)
print(item)  # Output: {'id': 1, 'name': 'example'}