Skip to content

Understanding Embedding Studio's Suggestion System

Embedding Studio includes a powerful suggestion system that provides intelligent, context-aware autocomplete functionality. This tutorial explores how the suggesting system works, its core concepts, and the underlying algorithms that make it effective.

Core Concepts

The suggestion system in Embedding Studio is designed around a few fundamental concepts:

1. Tokens and Chunks

At the heart of the suggesting system is tokenization - breaking text into meaningful units:

  • Tokenization: The process of splitting input text into smaller units (tokens)
  • Chunks: The tokens that make up phrases stored in the suggestion database
  • Spans: Position information about where tokens appear in the original text

The tokenizer identifies words, numbers, and individual symbols as separate tokens:

class SuggestingTokenizer:
    def __init__(self):
        # Match words, digits, or any single non-whitespace symbol
        self._pattern = re.compile(r"\w+|\d+|[^\w\s]")

2. Phrase Management

Phrases are the suggested content shown to users. The system:

  • Stores phrases with probability scores (indicating relevance)
  • Attaches labels and domains for categorization and filtering
  • Indexes phrases for efficient search and retrieval

3. Query Construction

When a user types, the system constructs a query based on:

  • Found chunks: Tokens the user has already typed
  • Next chunk: Partial token the user is currently typing
  • Domain: Optional context to filter suggestions (e.g., "shopping", "search")

4. Matching Strategies

The system employs multiple matching strategies:

  • Exact matching: Finds phrases with exact token matches
  • Prefix matching: Matches beginning portions of tokens
  • Fuzzy matching: Allows for small variations in spelling

How Suggesting Works: The Algorithm

Here's how the suggestion system works when a user is typing:

Step 1: Input Analysis

When a user types "how to make pas" and pauses, the system:

  1. Tokenizes the input: ["how", "to", "make", "pas"]
  2. Identifies complete tokens (["how", "to", "make"]) as "found chunks"
  3. Identifies the last partial token ("pas") as the "next chunk"

Step 2: Query Generation

The system creates a search query based on the user's input context:

// Simplified example of a generated query
(@chunk_0:how @chunk_1:to @chunk_2:make @chunk_3:pas*) | 
(@chunk_0:to @chunk_1:make @chunk_2:pas*) | 
(@chunk_0:make @chunk_1:pas*) | 
(@chunk_0:pas*)

This query looks for phrases: - Starting with "how to make" and having a 4th token starting with "pas" - Starting with "to make" and having a 3rd token starting with "pas" - Starting with "make" and having a 2nd token starting with "pas" - Starting with a token beginning with "pas"

The system runs parallel searches:

  1. Strict search: Using exact and prefix matching
  2. Soft search: Using fuzzy matching for more forgiving results
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
    # Schedule both queries
    f_strict = executor.submit(self._search_docs, strict_text_query, top_k)
    f_soft = executor.submit(self._search_docs, soft_text_query, top_k)

Step 4: Result Processing

  1. The system prioritizes strict matches over fuzzy matches
  2. It deduplicates results based on phrase content
  3. It applies domain filtering if a domain is specified
  4. It calculates where each suggestion should be split (where user input ends and suggestion begins)

Step 5: Response Formatting

For each suggestion, the system:

  1. Extracts the matching point between user input and suggestion
  2. Splits the suggestion into a "prefix" (what the user already typed) and a "postfix" (the suggestion part)
  3. Returns structured suggestions with metadata (probability, labels, match type)

Behind the Scenes: Data Storage

Embedding Studio supports two backends for the suggesting system:

MongoDB Storage

The MongoDB implementation: - Stores each token in a document with numbered fields (chunk_0, chunk_1, etc.) - Creates search prefixes for each token to enable fast prefix queries - Uses MongoDB aggregation pipelines for complex matching logic

Redis Storage

The Redis implementation: - Leverages RediSearch's specialized text search capabilities - Optimizes for high-performance autocompletion - Implements specialized indexing for prefix matching

Advanced Features

Probability-Based Ranking

Suggestions are ranked by probability score, allowing the system to: - Show most relevant suggestions first - Learn from user interactions to improve suggestions over time - Adjust suggestion rankings through API feedback

Domain-Based Filtering

The system supports domain-specific suggestions: - Phrases can belong to multiple domains - Domains are stored as tags in the database - Queries can be filtered to a specific domain context

Labels for Categorization

Labels provide additional metadata: - Group related suggestions - Enable filtering beyond domains - Support content organization

Customization and Extension

Embedding Studio's suggestion system is designed to be customizable: - Replace the tokenizer for language-specific needs - Implement custom pipeline generators for specialized matching logic - Extend the SuggestionPhraseManager for different storage backends

Performance Considerations

The system employs several strategies for high performance: - Parallel search for responsive suggestions - Efficient prefix indexing for fast text completion - Query optimization based on input context - Pipelined database operations for batch processing