Understanding Embedding Studio's Suggestion System¶
Embedding Studio includes a powerful suggestion system that provides intelligent, context-aware autocomplete functionality. This tutorial explores how the suggesting system works, its core concepts, and the underlying algorithms that make it effective.
Core Concepts¶
The suggestion system in Embedding Studio is designed around a few fundamental concepts:
1. Tokens and Chunks¶
At the heart of the suggesting system is tokenization - breaking text into meaningful units:
- Tokenization: The process of splitting input text into smaller units (tokens)
- Chunks: The tokens that make up phrases stored in the suggestion database
- Spans: Position information about where tokens appear in the original text
The tokenizer identifies words, numbers, and individual symbols as separate tokens:
class SuggestingTokenizer:
def __init__(self):
# Match words, digits, or any single non-whitespace symbol
self._pattern = re.compile(r"\w+|\d+|[^\w\s]")
2. Phrase Management¶
Phrases are the suggested content shown to users. The system:
- Stores phrases with probability scores (indicating relevance)
- Attaches labels and domains for categorization and filtering
- Indexes phrases for efficient search and retrieval
3. Query Construction¶
When a user types, the system constructs a query based on:
- Found chunks: Tokens the user has already typed
- Next chunk: Partial token the user is currently typing
- Domain: Optional context to filter suggestions (e.g., "shopping", "search")
4. Matching Strategies¶
The system employs multiple matching strategies:
- Exact matching: Finds phrases with exact token matches
- Prefix matching: Matches beginning portions of tokens
- Fuzzy matching: Allows for small variations in spelling
How Suggesting Works: The Algorithm¶
Here's how the suggestion system works when a user is typing:
Step 1: Input Analysis¶
When a user types "how to make pas" and pauses, the system:
- Tokenizes the input:
["how", "to", "make", "pas"]
- Identifies complete tokens (
["how", "to", "make"]
) as "found chunks" - Identifies the last partial token (
"pas"
) as the "next chunk"
Step 2: Query Generation¶
The system creates a search query based on the user's input context:
// Simplified example of a generated query
(@chunk_0:how @chunk_1:to @chunk_2:make @chunk_3:pas*) |
(@chunk_0:to @chunk_1:make @chunk_2:pas*) |
(@chunk_0:make @chunk_1:pas*) |
(@chunk_0:pas*)
This query looks for phrases: - Starting with "how to make" and having a 4th token starting with "pas" - Starting with "to make" and having a 3rd token starting with "pas" - Starting with "make" and having a 2nd token starting with "pas" - Starting with a token beginning with "pas"
Step 3: Parallel Search¶
The system runs parallel searches:
- Strict search: Using exact and prefix matching
- Soft search: Using fuzzy matching for more forgiving results
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
# Schedule both queries
f_strict = executor.submit(self._search_docs, strict_text_query, top_k)
f_soft = executor.submit(self._search_docs, soft_text_query, top_k)
Step 4: Result Processing¶
- The system prioritizes strict matches over fuzzy matches
- It deduplicates results based on phrase content
- It applies domain filtering if a domain is specified
- It calculates where each suggestion should be split (where user input ends and suggestion begins)
Step 5: Response Formatting¶
For each suggestion, the system:
- Extracts the matching point between user input and suggestion
- Splits the suggestion into a "prefix" (what the user already typed) and a "postfix" (the suggestion part)
- Returns structured suggestions with metadata (probability, labels, match type)
Behind the Scenes: Data Storage¶
Embedding Studio supports two backends for the suggesting system:
MongoDB Storage¶
The MongoDB implementation:
- Stores each token in a document with numbered fields (chunk_0
, chunk_1
, etc.)
- Creates search prefixes for each token to enable fast prefix queries
- Uses MongoDB aggregation pipelines for complex matching logic
Redis Storage¶
The Redis implementation: - Leverages RediSearch's specialized text search capabilities - Optimizes for high-performance autocompletion - Implements specialized indexing for prefix matching
Advanced Features¶
Probability-Based Ranking¶
Suggestions are ranked by probability score, allowing the system to: - Show most relevant suggestions first - Learn from user interactions to improve suggestions over time - Adjust suggestion rankings through API feedback
Domain-Based Filtering¶
The system supports domain-specific suggestions: - Phrases can belong to multiple domains - Domains are stored as tags in the database - Queries can be filtered to a specific domain context
Labels for Categorization¶
Labels provide additional metadata: - Group related suggestions - Enable filtering beyond domains - Support content organization
Customization and Extension¶
Embedding Studio's suggestion system is designed to be customizable:
- Replace the tokenizer for language-specific needs
- Implement custom pipeline generators for specialized matching logic
- Extend the SuggestionPhraseManager
for different storage backends
Performance Considerations¶
The system employs several strategies for high performance: - Parallel search for responsive suggestions - Efficient prefix indexing for fast text completion - Query optimization based on input context - Pipelined database operations for batch processing