MLOps lifecycle #

Building and deploying machine learning (ML) models is a cornerstone of most data science projects, and Dataiku provides a comprehensive set of features to ease and speed up these operations. While the platform offers a wide range of visual capabilities, it also exposes numerous programmatic elements for anyone who wants to handle their model’s lifecycle using code.

Training #

The first step of the machine learning process is to fit a model using training data. It is an experimental phase during which you can to test various combinations of pre-processing, algorithms, and parameters. The process of running such trials and logging their results is called experiment tracking ; it is implemented natively in Dataiku so that you can use a variety of ML frameworks to train models and log their performance and characteristics.

Note

Under the hood, Dataiku uses MLflow models as a standardized format to package models.

See also

Import #

In some cases, training a model can require much time and computing resources, so you may prefer to bring in an existing pre-trained model and perform subsequent operations in the Dataiku platform from there.

Several features can help speed up this process. You can either:

  • Retrieve and cache pre-trained models and embeddings provided by your ML framework of choice using code environment resources

  • Bring in model artifacts inside your Flow and store them in managed folders

You can fine-tune your models using experiment tracking or continue with evaluation and deployment.

See also

Evaluation #

Evaluating a model involves computing a set of metrics to reflect how well it performs against a specific evaluation dataset .

In Dataiku, these metrics encompass the predictive power of the model, its explainability , and drift indicators. The values of those metrics are computed in a buildable Flow item called the “evaluation store” and are accessible either in their raw form using the public API or visually through a set of rich visualizations embedded in the Dataiku web interface.

See also

Deployment and scoring #

The final step to make a model operational is to deploy it on a production infrastructure where it will be used to score incoming data. Depending on how the input data is expected to reach the model, Dataiku offers several deployment patterns:

  • If the model is meant to be queried via HTTP, Dataiku can package it as a REST API endpoint and take advantage of cloud-native infrastructures such as Kubernetes to ensure scalability and high-availability

  • For cases where larger data batches are expected to be processed and then scored, Dataiku allows the deployment of entire projects to production-ready instance types called Automation nodes .

Dataiku also offers flexible choices to pilot the deployment process, which can be executed using the platform’s native “Deployer” features or delegated to an external Continuous Integration/Delivery (CI/CD).

Note

For specific cases where models need to be exported outside of Dataiku, you can generate standalone Python or Java artifacts. For more information, see the related documentation page .

See also