Skip to content

Understanding the Embedding Model Lifecycle in Embedding Studio

This tutorial provides a comprehensive walkthrough of the embedding model lifecycle within Embedding Studio, from initial fine-tuning to deployment and continuous improvement.

Overview

Embedding Studio manages the full lifecycle of embedding models:

  1. Fine-tuning: Improving embedding models using feedback data
  2. Deployment: Making models available for inference
  3. Upsertion: Adding or updating vectors in the database
  4. Improvement: Adjusting vectors based on user feedback
  5. Reindexing: Migrating data between models

Let's explore each phase in detail.

Fine-Tuning Pipeline

Fine-tuning takes an existing embedding model and improves it using clickstream data (user interactions with search results).

Key Components

  • Fine-Tuning Tasks: Managed via the /fine-tuning/task endpoint
  • Clickstream Data: User sessions and interactions used for training
  • MLflow Tracking: Records experiments, parameters, and model metrics

Fine-Tuning Process

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚ Clickstream Data  β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Initial Model │───▢│ Fine-Tuning Job │───▢│ Improved Model  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚ MLflow Tracking   β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation Details

  1. Preparing Data:
  2. User search sessions are collected via the clickstream API
  3. Sessions are converted to training examples with positive/negative pairs
  4. Data is split into training and evaluation sets

  5. Hyperparameter Optimization:

  6. Multiple parameter configurations are tested
  7. Performance is evaluated using metrics like relevance improvement
  8. The best performing model is selected

  9. Model Storage:

  10. Trained models are stored in MLflow
  11. Models include both query and item encoders
  12. Metadata tracks lineage and performance metrics

Deployment Pipeline

Once a model is fine-tuned, it needs to be deployed to the inference service to be used for vector creation.

Key Components

  • Triton Inference Server: Handles efficient model serving
  • Deployment Worker: Manages the deployment process
  • Blue-Green Deployment: Enables zero-downtime updates

Deployment Process

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ MLflow Model  │───▢│ Model Converter │───▢│ Triton Model    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚ Inference Service β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation Details

  1. Model Retrieval:
  2. The model is downloaded from MLflow storage
  3. Both query and item models are extracted

  4. Conversion for Triton:

  5. Models are traced using PyTorch's JIT compiler
  6. Configuration files are generated for Triton
  7. Models are organized in the model repository

  8. Deployment Strategy:

  9. Models are versioned in the repository
  10. Triton handles model loading and GPU allocation
  11. Blue-green deployment ensures zero-downtime updates

Vector Management

Once models are deployed, Embedding Studio manages vector creation, storage, and querying.

Key Components

  • Upsertion Worker: Handles adding or updating vectors
  • Deletion Worker: Removes vectors from the database
  • Vector Database: Stores and indexes vectors (based on pgvector)

Upsertion Process

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Content Data  │───▢│ Item Splitter   │───▢│ Inference       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                     β”‚
                                                     β–Ό
                                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                            β”‚ Vector Database β”‚
                                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation Details

  1. Data Processing:
  2. Content is loaded using data loaders
  3. Items are split into manageable chunks
  4. Each chunk is processed through preprocessing pipeline

  5. Vector Creation:

  6. Chunks are sent to the inference service
  7. Resulting vectors are assembled
  8. Average vectors may be created for consolidated representation

  9. Storage Management:

  10. Vectors are stored with metadata and payload
  11. Indexes are maintained for efficient similarity search
  12. User-specific vectors can be stored for personalized results

Continuous Improvement

Embedding Studio enables continuous improvement through user feedback and incremental model updates.

Key Components

  • Clickstream Collection: Captures user interactions
  • Improvement Worker: Adjusts vectors based on feedback
  • Reindexing Worker: Migrates data between model versions

Improvement Process

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ User Sessions │───▢│ Vector Adjuster │───▢│ Improved Vectorsβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                                             β”‚
       β”‚                                             β”‚
       β–Ό                                             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Fine-Tuning   β”‚                         β”‚ Personalization β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation Details

  1. Feedback Collection:
  2. User clicks and interactions are recorded
  3. Sessions are analyzed for relevance patterns
  4. Irrelevant sessions can be marked and excluded

  5. Vector Adjustment:

  6. Clicked items' vectors are pulled closer to query vectors
  7. Non-clicked items' vectors are pushed away
  8. User-specific vector adjustments enable personalization

  9. Model Evolution:

  10. New models are fine-tuned based on collected feedback
  11. Data is migrated between model versions via reindexing
  12. Blue-green deployment ensures smooth transitions

Reindexing Between Models

When a new model version is created, data needs to be migrated from the old model to the new one.

Key Components

  • Reindex Worker: Manages the overall reindexing process
  • Reindex Subtasks: Process batches of data in parallel
  • Blue Collection Switch: Changes which model serves production traffic

Reindexing Process

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Source Model  │───▢│ Reindex Worker  │───▢│ Destination Modelβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚ Blue-Green Switch β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation Details

  1. Task Creation:
  2. Reindexing task specifies source and destination models
  3. Locking prevents concurrent operations on the same models
  4. Configuration controls batch size and concurrency

  5. Parallel Processing:

  6. Data is processed in batches for efficiency
  7. Multiple subtasks run concurrently
  8. Progress is tracked and failures are recorded

  9. Deployment Coordination:

  10. Optional model deployment if needed
  11. Blue collection switch changes active model
  12. Source model cleanup can be performed after successful migration

Complete Workflow

Here's a step-by-step workflow of the entire embedding model lifecycle in Embedding Studio:

  1. Initial Model Deploy and Collection Creation
  2. Upload initial model or use an existing one
  3. Deploy model to Triton Inference Server
  4. Create vector collection in the database for this model
  5. Set as "blue" (active) collection for serving traffic

  6. Upsertion

  7. Send content items to the upsertion endpoint
  8. Content is split into chunks
  9. Chunks are transformed into vectors via the inference service
  10. Vectors are stored in the database with metadata

  11. Search and Clickstream Collection

  12. Users perform searches via similarity search endpoints
  13. Search queries are vectorized and compared against stored vectors
  14. User interactions with results are captured via clickstream API
  15. Sessions track queries, results, and user actions

  16. Vector Improvement via Feedback

  17. Clickstream data is analyzed for feedback signals
  18. Improvement worker processes feedback sessions
  19. Vectors are adjusted based on user interactions
  20. Personalized vectors maintain user-specific adjustments

  21. Fine-tuning via Feedback

  22. Sufficient feedback triggers fine-tuning job (via API)
  23. Clickstream data is converted to training examples
  24. Model undergoes hyperparameter optimization
  25. New model version is created and evaluated

  26. New Model Deployment

  27. If quality improvement is sufficient, deploy new model
  28. Create new vector collection for the improved model
  29. New collection initially doesn't serve production traffic

  30. Reindexing

  31. Data is migrated from old model to new model
  32. Process runs in batches with parallel workers
  33. New items/updates go directly to the new model during migration
  34. Personalized vectors are removed or recreated

  35. Switch Active Model

  36. New model and collection are set as "blue" (active)
  37. All new search traffic uses the improved model
  38. Switch happens with zero downtime

  39. Cleanup

  40. Previous collection is deleted after successful switch
  41. Old model is removed from the inference service
  42. System is ready for next improvement cycle

This cycle continues iteratively, with each round potentially delivering better search quality based on real user feedback.