astrodata.tracking package
Submodules
astrodata.tracking.CodeTracking module
astrodata.tracking.DataTracking module
astrodata.tracking.MLFlowTracker module
- class astrodata.tracking.MLFlowTracker.MlflowBaseTracker(run_name=None, experiment_name=None, extra_tags=None, tracking_uri=None, tracking_username=None, tracking_password=None)
Bases:
ModelTracker
Base tracker class for MLflow experiment tracking.
Handles MLflow configuration and provides base methods for registering tracked models.
- register_best_model(metric, registered_model_name=None, split_name='train', stage='Production')
Register the best model in MLflow Model Registry based on a metric.
- Parameters:
metric (BaseMetric) – Metric used to select the best run.
model_artifact_path (str, optional) – Path to the model artifact in MLflow run.
registered_model_name (str, optional) – Name for the registered model. Defaults to experiment name.
split_name (str, optional) – Which split’s metric to use (‘train’, ‘val’, or ‘test’).
stage (str, optional) – Model stage to assign (e.g., ‘Production’, ‘Staging’).
- Returns:
The result of the registration.
- Return type:
mlflow.entities.model_registry.RegisteredModelVersion
- Raises:
ValueError – If the experiment or suitable run is not found.
- wrap_fit(obj)
Placeholder for tracker-specific model wrapping.
To be implemented in subclass.
- Return type:
<module ‘astrodata.ml.models.BaseMlModel’ from ‘/home/runner/work/astrodata/astrodata/astrodata/ml/models/BaseMlModel.py’>
- class astrodata.tracking.MLFlowTracker.SklearnMLflowTracker(*args, **kwargs)
Bases:
MlflowBaseTracker
Tracker for scikit-learn models with MLflow integration.
Provides run lifecycle, parameter logging, metric logging, and optional model logging.
- wrap_fit(model, X_test=None, y_test=None, X_val=None, y_val=None, metrics=None, log_model=False, tags={}, manual_metrics=None)
Wrap a BaseMlModel’s fit method to perform MLflow logging.
- Parameters:
model (BaseMlModel) – The model to wrap.
X_test (array-like, optional) – Test data for metric logging.
y_test (array-like, optional) – Test labels for metric logging.
X_val (array-like, optional) – Validation data for metric logging.
y_val (array-like, optional) – Validation labels for metric logging.
metrics (list of BaseMetric, optional) – Metrics to log. If missing, a default loss metric is added.
log_model (bool, optional) – If True, log the fitted model as an MLflow artifact.
tags (Dict[str, Any] default {}) – Any additional tags that should be added to the model. By default the tag “is_final” is set as equal to log_model so that any logged model is considered as a candidate for production (for register_best_model) unless specified otherwise (e.g. in the model selectors for intermediate steps)
- Returns:
A new instance of the model with an MLflow-logging fit method.
- Return type:
- astrodata.tracking.MLFlowTracker.log_metrics_and_loss(X_split, y_split, model, metrics, split_name)
Log metrics and loss curves for a data split.
- Parameters:
X_split (array-like) – Features.
y_split (array-like) – Labels.
split_name (str) – Name of the split (‘train’, ‘val’, ‘test’).
- astrodata.tracking.MLFlowTracker.log_metrics_manual(metrics, split_name)
astrodata.tracking.ModelTracker module
- class astrodata.tracking.ModelTracker.ModelTracker
Bases:
ABC
Abstract base class for tracking model fitting processes.
- abstractmethod wrap_fit(obj)
Wrap the fit method of an object to add tracking or logging.
- Parameters:
obj (Any) – The object whose fit method will be wrapped.
- Returns:
The wrapped object.
- Return type:
astrodata.tracking.Tracker module
- class astrodata.tracking.Tracker.Tracker(config_path)
Bases:
object
Orchestrates code and data tracking for a project using Git and DVC.
This class manages both code and data versioning, providing methods to track, commit, and push changes to remote repositories for reproducible research.
- Parameters:
config_path (str) – Path to the configuration file.
- track(commit_message=None)
Orchestrate the tracking of data and code, pushing data and committing code.
This method aligns the code repository with the remote, tracks data and code changes, pushes data to the DVC remote, and commits and pushes code changes to the Git remote.
- Parameters:
commit_message (str, optional) – Commit message for the code changes.