Tracking experiments in code
Initial setup
Before you can track experiments, you need to create a Managed Folder in the project. The managed folder will be used to store artefacts. Take note of the managed folder id (8 alphanum characters, visible in the URL).
Quick start sample
import dataiku
project = dataiku.api_client().get_default_project()
managed_folder = project.get_managed_folder('A_MANAGED_FOLDER_ID')
with project.setup_mlflow(managed_folder=managed_folder) as mlflow_handle:
# Note: if you don't call this (i.e. when no experiment is specified), the default one is used
mlflow_handle.set_experiment("My first experiment")
with mlflow_handle.start_run(run_name="my_run"):
# ...your MLflow code...
mlflow_handle.log_param("a", 1)
mlflow_handle.log_metric("b", 2)
# This uses the regular MLflow APIs
Tracking API
DSS uses the MLflow Tracking API. Please refer to the MLflow Tracking documentation.
Autologging
MLflow Tracking comes with a very useful feature: autologging, which automatically logs metrics, parameters, and models for common machine-learning packages without the need for explicit log statements.
Leveraging MLflow autologging requires no additional configuration of the DSS integration. Some machine learning packages, such as PyTorch, may however require additional packages .
In the following sample, we activate MLflow autologging for a SKlearn model. Metrics and artifacts are automatically logged.
import dataiku
import mlflow
import sklearn.linear_model.ElasticNet
project = dataiku.api_client().get_default_project()
managed_folder = project.get_managed_folder('A_MANAGED_FOLDER_ID')
with project.setup_mlflow(managed_folder=managed_folder) as mlflow_handle:
mlflow_handle.set_experiment("Let's autolog")
# activate Mflow autologging
mlflow_handle.sklearn.autolog()
with mlflow_handle.start_run(run_name="my_run"):
lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
lr.fit(train_x, train_y)
Other topics
Logging into another project
You can log experiments into another project than the current one by using:
project = dataiku.api_client().get_project("MYOTHERPROJECT")
Experiment tracking outside DSS
MLflow Tracking integration is configured through the
dataikuapi
package. See
Using the APIs outside of DSS
for how to use it from outside of DSS
Usage without context manager
While the usage of the context manager (“with” statement) is recommended, it is not mandatory. You can use this instead:
import dataiku
import mlflow
project = dataiku.api_client().get_default_project()
managed_folder = project.get_managed_folder('A_MANAGED_FOLDER_ID')
mlflow_handle = project.setup_mlflow(managed_folder=managed_folder)
mlflow.set_experiment("My first experiment")
with mlflow.start_run(run_name="my_run"):
# ...your MLflow code...
mlflow.log_param("a", 1)
mlflow.log_metric("b", 2)
mlflow_handle.clear()
Cautions
If you do not set up the integration before using the MLflow client, or use the client after clearing the integration, it may fall back to its default mode: writing experiment data as the current user, on the filesystem of the host of the DSS server.
Supported versions
See Limitations and supported versions for supported MLflow versions.