Understanding the Embedding Model Lifecycle in Embedding Studio¶
This tutorial provides a comprehensive walkthrough of the embedding model lifecycle within Embedding Studio, from initial fine-tuning to deployment and continuous improvement.
Overview¶
Embedding Studio manages the full lifecycle of embedding models:
- Fine-tuning: Improving embedding models using feedback data
- Deployment: Making models available for inference
- Upsertion: Adding or updating vectors in the database
- Improvement: Adjusting vectors based on user feedback
- Reindexing: Migrating data between models
Let's explore each phase in detail.
Fine-Tuning Pipeline¶
Fine-tuning takes an existing embedding model and improves it using clickstream data (user interactions with search results).
Key Components¶
- Fine-Tuning Tasks: Managed via the
/fine-tuning/task
endpoint - Clickstream Data: User sessions and interactions used for training
- MLflow Tracking: Records experiments, parameters, and model metrics
Fine-Tuning Process¶
βββββββββββββββββββββ
β Clickstream Data β
βββββββββββ¬ββββββββββ
β
βΌ
βββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Initial Model βββββΆβ Fine-Tuning Job βββββΆβ Improved Model β
βββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββββ
β MLflow Tracking β
βββββββββββββββββββββ
Implementation Details¶
- Preparing Data:
- User search sessions are collected via the clickstream API
- Sessions are converted to training examples with positive/negative pairs
-
Data is split into training and evaluation sets
-
Hyperparameter Optimization:
- Multiple parameter configurations are tested
- Performance is evaluated using metrics like relevance improvement
-
The best performing model is selected
-
Model Storage:
- Trained models are stored in MLflow
- Models include both query and item encoders
- Metadata tracks lineage and performance metrics
Deployment Pipeline¶
Once a model is fine-tuned, it needs to be deployed to the inference service to be used for vector creation.
Key Components¶
- Triton Inference Server: Handles efficient model serving
- Deployment Worker: Manages the deployment process
- Blue-Green Deployment: Enables zero-downtime updates
Deployment Process¶
βββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β MLflow Model βββββΆβ Model Converter βββββΆβ Triton Model β
βββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββββ
β Inference Service β
βββββββββββββββββββββ
Implementation Details¶
- Model Retrieval:
- The model is downloaded from MLflow storage
-
Both query and item models are extracted
-
Conversion for Triton:
- Models are traced using PyTorch's JIT compiler
- Configuration files are generated for Triton
-
Models are organized in the model repository
-
Deployment Strategy:
- Models are versioned in the repository
- Triton handles model loading and GPU allocation
- Blue-green deployment ensures zero-downtime updates
Vector Management¶
Once models are deployed, Embedding Studio manages vector creation, storage, and querying.
Key Components¶
- Upsertion Worker: Handles adding or updating vectors
- Deletion Worker: Removes vectors from the database
- Vector Database: Stores and indexes vectors (based on pgvector)
Upsertion Process¶
βββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Content Data βββββΆβ Item Splitter βββββΆβ Inference β
βββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Vector Database β
βββββββββββββββββββ
Implementation Details¶
- Data Processing:
- Content is loaded using data loaders
- Items are split into manageable chunks
-
Each chunk is processed through preprocessing pipeline
-
Vector Creation:
- Chunks are sent to the inference service
- Resulting vectors are assembled
-
Average vectors may be created for consolidated representation
-
Storage Management:
- Vectors are stored with metadata and payload
- Indexes are maintained for efficient similarity search
- User-specific vectors can be stored for personalized results
Continuous Improvement¶
Embedding Studio enables continuous improvement through user feedback and incremental model updates.
Key Components¶
- Clickstream Collection: Captures user interactions
- Improvement Worker: Adjusts vectors based on feedback
- Reindexing Worker: Migrates data between model versions
Improvement Process¶
βββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β User Sessions βββββΆβ Vector Adjuster βββββΆβ Improved Vectorsβ
βββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β
β β
βΌ βΌ
βββββββββββββββββ βββββββββββββββββββ
β Fine-Tuning β β Personalization β
βββββββββββββββββ βββββββββββββββββββ
Implementation Details¶
- Feedback Collection:
- User clicks and interactions are recorded
- Sessions are analyzed for relevance patterns
-
Irrelevant sessions can be marked and excluded
-
Vector Adjustment:
- Clicked items' vectors are pulled closer to query vectors
- Non-clicked items' vectors are pushed away
-
User-specific vector adjustments enable personalization
-
Model Evolution:
- New models are fine-tuned based on collected feedback
- Data is migrated between model versions via reindexing
- Blue-green deployment ensures smooth transitions
Reindexing Between Models¶
When a new model version is created, data needs to be migrated from the old model to the new one.
Key Components¶
- Reindex Worker: Manages the overall reindexing process
- Reindex Subtasks: Process batches of data in parallel
- Blue Collection Switch: Changes which model serves production traffic
Reindexing Process¶
βββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Source Model βββββΆβ Reindex Worker βββββΆβ Destination Modelβ
βββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββββ
β Blue-Green Switch β
βββββββββββββββββββββ
Implementation Details¶
- Task Creation:
- Reindexing task specifies source and destination models
- Locking prevents concurrent operations on the same models
-
Configuration controls batch size and concurrency
-
Parallel Processing:
- Data is processed in batches for efficiency
- Multiple subtasks run concurrently
-
Progress is tracked and failures are recorded
-
Deployment Coordination:
- Optional model deployment if needed
- Blue collection switch changes active model
- Source model cleanup can be performed after successful migration
Complete Workflow¶
Here's a step-by-step workflow of the entire embedding model lifecycle in Embedding Studio:
- Initial Model Deploy and Collection Creation
- Upload initial model or use an existing one
- Deploy model to Triton Inference Server
- Create vector collection in the database for this model
-
Set as "blue" (active) collection for serving traffic
-
Upsertion
- Send content items to the upsertion endpoint
- Content is split into chunks
- Chunks are transformed into vectors via the inference service
-
Vectors are stored in the database with metadata
-
Search and Clickstream Collection
- Users perform searches via similarity search endpoints
- Search queries are vectorized and compared against stored vectors
- User interactions with results are captured via clickstream API
-
Sessions track queries, results, and user actions
-
Vector Improvement via Feedback
- Clickstream data is analyzed for feedback signals
- Improvement worker processes feedback sessions
- Vectors are adjusted based on user interactions
-
Personalized vectors maintain user-specific adjustments
-
Fine-tuning via Feedback
- Sufficient feedback triggers fine-tuning job (via API)
- Clickstream data is converted to training examples
- Model undergoes hyperparameter optimization
-
New model version is created and evaluated
-
New Model Deployment
- If quality improvement is sufficient, deploy new model
- Create new vector collection for the improved model
-
New collection initially doesn't serve production traffic
-
Reindexing
- Data is migrated from old model to new model
- Process runs in batches with parallel workers
- New items/updates go directly to the new model during migration
-
Personalized vectors are removed or recreated
-
Switch Active Model
- New model and collection are set as "blue" (active)
- All new search traffic uses the improved model
-
Switch happens with zero downtime
-
Cleanup
- Previous collection is deleted after successful switch
- Old model is removed from the inference service
- System is ready for next improvement cycle
This cycle continues iteratively, with each round potentially delivering better search quality based on real user feedback.