Model Tracking¶
The astrodata.tracking
module provides capabilities for experiment tracking, primarily through integration with MLflow. This allows users to log model parameters, metrics, artifacts, and manage different versions of their machine learning models.
Abstract Classes¶
ModelTracker
is the abstract base class that all model trackers should inherit from. It defines the fundamental methods that a model tracker object must implement:
wrap_fit(BaseMlModel)
: Takes as input a model and returns the same model with itsfit
method now able to log the model to the required service.
How to Use¶
Initializing a tracker¶
How a tracker is initialized depends on the tracker implementation. Once a tracker is initialized, all that’s needed is to call its wrap_fit()
method with an already initialized model as argument to get the same model with the wrapped fit()
method as output. Once this is done you can proceed by using the model as usual.
from astrodata.tracking.ModelTracker import ModelTracker
from astrodata.ml.models.BaseMlModel import BaseMlModel
model = BaseMlModel( ... )
tracker = ModelTracker( ... )
tracked_model = tracker.wrap_fit(model)
tracked_model.fit(X_train, y_train)
Whenever you call the .fit
method of the model after wrapping it, the tracker will handle logging it to its own service; when using an MlFlowTracker
for example, the logs are either stored locally or in an external server whenever the model is fitted, each run represent a call of the .fit
method and it containes the information of the logged metrics, the model’s parameters and any artifact (the model files) that may be stored. Different services store different things, but as the most used open source ml-ops application, astrodata
implements MlFlow by default.
MLFlowTracker
¶
As previously stated, an MlFlowTracker
represent an MlFlow implementation of a ModelTracker
, the class is a base on top of which framework specific trackers should be build (such as SklearnMLflowTracker
) as it does not feature a wrap_fit()
method, but it implements all the required methods to store the relevant informations that are required to connect to and use an MlFlow server
Parameters¶
run_name : str, optional
Name for MLflow run.
experiment_name : str, optional
Name of the MLflow experiment.
extra_tags : dict, optional
Extra tags to log with the run.
tracking_uri : str, optional
MLflow tracking server URI.
tracking_username : str, optional
Username for authentication (if needed).
tracking_password : str, optional
Password for authentication (if needed).
SklearnMLflowTracker
¶
This class extends MlFlowTracker
to give it a proper wrap_fit()
method that is suitable to track scikit-learn
compatible models.
Attention
XgBoostModel
is a scikit-learn
compatible model so it uses the same tracker!
SklearnMLflowTracker.wrap_fit()
parameters¶
model : BaseMlModel
The model to wrap.
X_test : array-like, optional
Test data for metric logging.
y_test : array-like, optional
Test labels for metric logging.
X_val : array-like, optional
Validation data for metric logging.
y_val : array-like, optional
Validation labels for metric logging.
metrics : list of BaseMetric, optional
Metrics to log. If missing, a default loss metric is added.
log_model : bool, optional
If True, log the fitted model as an MLflow artifact.
tags: Dict[str, Any] default {}
Any additional tags that should be added to the model.
Example wrap_fit()
usage¶
When calling the wrap_fit()
method on an existing model, some parameters are passed to allow for correct metrics computation such as a metrics
array (which logs the model loss by default), data to test on (either X_val/y_val
or X_test/y_test
), and any additional tag that we want to add while logging the model.
from sklearn.ensemble import GradientBoostingClassifier
from astrodata.ml.models.SklearnModel import SklearnModel
from astrodata.tracking.MLFlowTracker import SklearnMLflowTracker
gradientboost = SklearnModel(model_class=GradientBoostingClassifier)
tracker = SklearnMLflowTracker(
run_name="MlFlowSimpleRun",
experiment_name="simple_example",
extra_tags={"stage": "testing"},
)
tracked_gradientboost = tracker.wrap_fit(
gradientboost, X_test=X_test, y_test=y_test, metrics=metrics, log_model=True
)
tracked_gradientboost.fit(X_train, y_train)