Skip to content

mlflow_with_gcp_backend

Documentation for ExperimentsManagerWithGCPBackend

Functionality

This class provides a wrapper over the mlflow package to manage fine-tuning experiments with a Google Cloud backend. It integrates MLFlow tracking with GCP services, managing artifacts via Google Cloud Storage and applying retry strategies on API errors.

Parameters

  • tracking_uri: URL of the MLFlow server.
  • gcp_credentials: GCP credentials (Pydantic model with project and bucket details).
  • main_metric: Primary metric name for evaluating experiments.
  • plugin_name: Name of the fine-tuning method used.
  • accumulators: List of metric accumulators for logging.
  • is_loss: Boolean indicating if the main metric represents a loss.
  • n_top_runs: Number of top parameter groups to consider.
  • requirements: Extra requirements for logging models to MLflow.
  • retry_config: Configuration for retry policies.

Usage

Purpose: To manage MLFlow experiments by leveraging GCP storage for artifact management, while extending core experiment functionalities.

Inheritance: Inherits from ExperimentsManager, adding GCP backend integration.

Example

manager = ExperimentsManagerWithGCPBackend(
    tracking_uri="http://mlflow.server",
    gcp_credentials=GCPCredentials(
        project_id="your_project_id",
        bucket_name="your_bucket_name",
        client_email="your_email",
        private_key="your_private_key",
        private_key_id="your_key_id"
    ),
    main_metric="accuracy",
    plugin_name="fine_tuning",
    accumulators=[],
    is_loss=False,
    n_top_runs=10,
    requirements=None,
    retry_config=None
)

Documentation for ExperimentsManagerWithGCPBackend.is_retryable_error

Functionality

This method checks if an exception encountered from Google Cloud Storage is retryable. Specifically, if the exception is a GoogleAPIError with HTTP status code 500 or above, it logs an error message and returns True. Otherwise, it returns False.

Parameters

  • e: Exception instance to check. Usually a GoogleAPIError object.

Usage

  • Purpose - Determine if a GCP error is retryable for retry logic.

Example

from google.api_core.exceptions import GoogleAPIError

# Example of using is_retryable_error
error = GoogleAPIError('Server error', code=500)
if instance.is_retryable_error(error):
    # implement retry mechanism
    pass

Documentation for ExperimentsManagerWithGCPBackend._delete_model

Functionality

This method deletes model artifact files stored in Google Cloud Storage. It uses the MLflow run information to obtain the artifact URI, extracts the object path using the provided GCP bucket name, and deletes the file from the bucket using the GCP client.

Parameters

  • run_id: Identifier for the experiment run used to obtain model info.
  • experiment_id: Identifier for the experiment. Although not currently used, it may be used for logging or future extensions.

Usage

  • Purpose: Remove a stored model's files after experiment cleanup.

Example

manager = ExperimentsManagerWithGCPBackend(
    tracking_uri,
    gcp_credentials,
    main_metric,
    plugin_name,
    accumulators
)
success = manager._delete_model(run_id="run_123", experiment_id="exp_456")
if success:
    print("Model deleted successfully.")