Merged Documentation¶
Functionality Overview¶
Functionality¶
convert_vectors
: Converts a string of vector data into a numpy array. It strips outer characters, splits the inner JSON arrays, and applies json.loads to each component.DbObjectBase
: An abstract base class for vector database object tables using SQLAlchemy's declarative base. It provides standard columns for object identification, payload data, and storage metadata, creating a consistent and extensible framework for database models.
Parameters¶
vectors_data
: A string representation of vectors from the database, expected to have extra enclosing characters.- No parameters are applicable for
DbObjectBase
attributes.
Returns¶
- A numpy array containing the parsed vectors from the input.
Usage¶
- Purpose: Convert vector strings from the database into numpy arrays for vector operations. The various attributes in
DbObjectBase
provide foundational database structures, such as user identifiers, JSON payloads, and metadata about how these objects should be stored.
Example Usage¶
-
convert_vectors:
Returns:vectors_data = "[[1.0, 2.0, 3.0],[4.0, 5.0, 6.0]]" numpy_array = convert_vectors(vectors_data)
[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
-
DbObjectBase:
class MyObject(DbObjectBase): additional_field = mapped_column(String(256))
Documentation for DbObjectBase
Attributes¶
object_id¶
Defines the primary key column for vector database objects. This attribute is a SQLAlchemy mapped column definition using a String type with a maximum length of 128, marked as the primary key to ensure unique identification of each object in the database. It automatically includes this attribute in subclasses' schemas.
payload¶
Stores arbitrary payload data as a JSONB column in the database, enabling storage of additional unstructured data as metadata.
storage_meta¶
Defines a JSONB column to store metadata for object storage, allowing for flexible storage of key-value pairs without a fixed schema.
original_id¶
Stores the original ID of an object if it is derived, facilitating tracking of provenance for derived objects.
user_id¶
Stores the unique user identifier associated with the object, linking the object to its respective user account.
session_id¶
Defines a SQLAlchemy column that stores the session ID associated with a database object, forming part of the schema for vector database objects.
Documentation for DbObjectPartBase
Attributes¶
part_id¶
Defines the primary key column for object parts in the vector database, mapping a string identifier as the primary key of each object part.
object_id¶
Returns an SQLAlchemy column that serves as a foreign key to the parent object, defined as a 128-character string and indexed for faster lookup.
vector¶
Defines a SQLAlchemy column for storing the embedding vector in a database object part, based on the pgvector extension.
is_average¶
Stores a boolean flag indicating if the vector is an average, auto-generated to mark aggregate vectors.
user_id¶
Defines a SQLAlchemy column that tracks the user ID associated with an object part, enabling monitoring of user responsibility for part creation or modifications.
Documentation for DbObjectImpl
Class¶
Functionality¶
DbObjectImpl
is a mixin class that provides common methods for operating on database object tables. It includes utility methods for generating SQL statements for creating tables, inserting objects, and upserting objects, as well as converting object instances to dictionaries.
Main Functionality¶
- create_table: Sets up the database table for storing object records.
- insert_objects_statement: Generates a SQL statement to insert objects into the database.
- upsert_objects_statement: Generates a SQL upsert statement to insert or update objects.
- db_object_to_dict: Converts a
DbObject
instance into a dictionary. - db_objects_to_dicts: Converts a list of
DbObject
instances to a list of dictionaries.
Documentation for DbObjectPartImpl
Class¶
Functionality¶
DbObjectPartImpl
is an implementation mixin that provides common methods for handling database object part tables. It offers helper functions for SQL statement generation, vector distance calculations, and validation of vector dimensions to streamline operations on vector data in the database.
Main Functions¶
- initialize: Initializes the mixin with search index and database object class configurations.
- create_table: Creates the database table for object parts if it does not yet exist.
- validate_dimensions: Verifies input vectors against expected dimensions, raising exceptions for mismatches.
- distance_expression: Generates SQLAlchemy expressions for vector distance calculations.
- similarity_search_statement: Constructs SQL queries for performing vector similarity searches.
Additional Functionalities¶
get_dbo_table_name¶
Generates dynamic table and index names for a collection based on the provided CollectionInfo
object.
make_db_model¶
Creates and registers dynamic database model classes for a given collection, defining ORM models with proper table names and relationships.
DbObjectPart.hnsw_index¶
Creates a PostgreSQL index using the HNSW algorithm for the vector column, optimizing vector search performance.
DbObjectPart¶
The DbObjectPart
class stores and represents distinct parts of an object in the vector database, supporting complex object modeling through partitioned components.
Final Remarks¶
This comprehensive overview of methods and classes outlines how to efficiently manage and interact with vector databases, from storing data to searching and manipulating complex embeddings. Each class and method serves to enhance the performance and usability of data operations in vector databases.