# Machine learning[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#machine-learning "Permalink to this headline")

Through the public API, the Python client allows you to automate all the aspects of the lifecycle of machine learning models.

* Creating a visual analysis and ML task

* Tuning settings

* Training models

* Inspecting model details and results

* Deploying saved models to Flow and retraining them

* Concepts

* Usage samples

+ The whole cycle

+ Obtaining a handle to an existing ML Task

+ Tuning feature preprocessing

- Enabling and disabling features

- Changing advanced parameters for a feature

+ Tuning algorithms

- Global parameters for hyperparameter search

- Algorithm specific hyperparameter search

+ Exporting a model documentation

* API Reference

+ Interaction with a ML Task

+ Manipulation of settings

+ Exploration of results

+ Saved models

+ MLflow models

* Algorithm details

+ LOGISTIC\_REGRESSION

+ RANDOM\_FOREST\_CLASSIFICATION

+ RANDOM\_FOREST\_REGRESSION

+ EXTRA\_TREES

+ RIDGE\_REGRESSION

+ LASSO\_REGRESSION

+ LEASTSQUARE\_REGRESSION

+ SVC\_CLASSIFICATION

+ SVM\_REGRESSION

+ SGD\_CLASSIFICATION

+ SGD\_REGRESSION

+ GBT\_CLASSIFICATION

+ GBT\_REGRESSION

+ DECISION\_TREE\_CLASSIFICATION

+ DECISION\_TREE\_REGRESSION

+ LIGHTGBM\_CLASSIFICATION

+ LIGHTGBM\_REGRESSION

+ XGBOOST\_CLASSIFICATION

+ XGBOOST\_REGRESSION

+ NEURAL\_NETWORK

+ DEEP\_NEURAL\_NETWORK\_REGRESSION

+ DEEP\_NEURAL\_NETWORK\_CLASSIFICATION

+ KNN

+ LARS

+ MLLIB\_LOGISTIC\_REGRESSION

+ MLLIB\_DECISION\_TREE

+ MLLIB\_RANDOM\_FOREST

+ MLLIB\_GBT

+ MLLIB\_LINEAR\_REGRESSION

+ MLLIB\_NAIVE\_BAYES

+ Other

## Concepts[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#concepts "Permalink to this headline")

In DSS, you train models as part of a *visual analysis*. A visual analysis is made of a preparation script, and one or several *ML Tasks*.

A ML Task is an individual section in which you train models. A ML Task is either a prediction of a single target variable, or a clustering.

The ML API allows you to manipulate ML Tasks, and use them to train models, inspect their details, and deploy them to the Flow.

Once deployed to the Flow, the *Saved model* can be retrained by the usual build mechanism of DSS.

A ML Task has settings, which control:

* Which features are active

* The preprocessing settings for each features

* Which algorithms are active

* The hyperparameter settings (including grid searched hyperparameters) for each algorithm

* The settings of the grid search

* Train/Test splitting settings

* Feature selection and generation settings

## Usage samples[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#usage-samples "Permalink to this headline")

### The whole cycle[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#the-whole-cycle "Permalink to this headline")

This examples create a prediction task, enables an algorithm, trains it, inspects models, and deploys one of the model to Flow

§ # client is a DSS API client

§ p = client.get\_project("MYPROJECT")

§ # Create a new ML Task to predict the variable "target" from "trainset"

§ mltask = p.create\_prediction\_ml\_task(

§ input\_dataset="trainset",

§ target\_variable="target",

§ ml\_backend\_type='PY\_MEMORY', # ML backend to use

§ guess\_policy='DEFAULT' # Template to use for setting default parameters

§ )

§ # Wait for the ML task to be ready

§ mltask.wait\_guess\_complete()

§ # Obtain settings, enable GBT, save settings

§ settings = mltask.get\_settings()

§ settings.set\_algorithm\_enabled("GBT\_CLASSIFICATION", True)

§ settings.save()

§ # Start train and wait for it to be complete

§ mltask.start\_train()

§ mltask.wait\_train\_complete()

§ # Get the identifiers of the trained models

§ # There will be 3 of them because Logistic regression and Random forest were default enabled

§ ids = mltask.get\_trained\_models\_ids()

§ for id in ids:

§ details = mltask.get\_trained\_model\_details(id)

§ algorithm = details.get\_modeling\_settings()["algorithm"]

§ auc = details.get\_performance\_metrics()["auc"]

§ print("Algorithm=%s AUC=%s" % (algorithm, auc))

§ # Let's deploy the first model

§ model\_to\_deploy = ids[0]

§ ret = mltask.deploy\_to\_flow(model\_to\_deploy, "my\_model", "trainset")

§ print("Deployed to saved model id = %s train recipe = %s" % (ret["savedModelId"], ret["trainRecipeName"]))

The methods for creating prediction and clustering ML tasks are defined at `dataikuapi.dss.project.DSSProject.create\_prediction\_ml\_task()` and `dataikuapi.dss.project.DSSProject.create\_clustering\_ml\_task()`.

### Obtaining a handle to an existing ML Task[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#obtaining-a-handle-to-an-existing-ml-task "Permalink to this headline")

When you create these ML tasks, the returned `dataikuapi.dss.ml.DSSMLTask` object will contain two fields `analysis\_id` and `mltask\_id` that can later be used to retrieve the same `DSSMLTask` object

§ # client is a DSS API client

§ p = client.get\_project("MYPROJECT")

§ mltask = p.get\_ml\_task(analysis\_id, mltask\_id)

### Tuning feature preprocessing[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#tuning-feature-preprocessing "Permalink to this headline")

#### Enabling and disabling features[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#enabling-and-disabling-features "Permalink to this headline")

§ # mltask is a DSSMLTask object

§ settings = mltask.get\_settings()

§ settings.reject\_feature("not\_useful")

§ settings.use\_feature("useful")

§ settings.save()

#### Changing advanced parameters for a feature[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#changing-advanced-parameters-for-a-feature "Permalink to this headline")

§ # mltask is a DSSMLTask object

§ settings = mltask.get\_settings()

§ # Use impact coding rather than dummy-coding

§ fs = settings.get\_feature\_preprocessing("mycategory")

§ fs["category\_handling"] = "IMPACT"

§ # Impute missing with most frequent value

§ fs["missing\_handling"] = "IMPUTE"

§ fs["missing\_impute\_with"] = "MODE"

§ settings.save()

### Tuning algorithms[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#tuning-algorithms "Permalink to this headline")

#### Global parameters for hyperparameter search[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#global-parameters-for-hyperparameter-search "Permalink to this headline")

This sample shows how to modify the parameters of the search to be performed on the hyperparameters.

§ # mltask is a DSSMLTask object

§ settings = mltask.get\_settings()

§ hp\_search\_settings = mltask\_settings.get\_hyperparameter\_search\_settings()

§ # Set the search strategy either to "GRID", "RANDOM" or "BAYESIAN"

§ hp\_search\_settings.strategy = "RANDOM"

§ # Alternatively use a setter, either set\_grid\_search

§ # set\_random\_search or set\_bayesian\_search

§ hp\_search\_settings.set\_random\_search(seed=1234)

§ # Set the validation mode either to "KFOLD", "SHUFFLE" (or accordingly their

§ # "TIME\_SERIES"-prefixed counterpart) or "CUSTOM"

§ hp\_search\_settings.validation\_mode = "KFOLD"

§ # Alternatively use a setter, either set\_kfold\_validation, set\_single\_split\_validation

§ # or set\_custom\_validation

§ hp\_search\_settings.set\_kfold\_validation(n\_folds=5, stratified=True)

§ # Save the settings

§ settings.save()

#### Algorithm specific hyperparameter search[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#algorithm-specific-hyperparameter-search "Permalink to this headline")

This sample shows how to modify the settings of the Random Forest Classification algorithm, where two kinds of hyperparameters (multi-valued numerical and single-valued) are introduced.

§ # mltask is a DSSMLTask object

§ settings = mltask.get\_settings()

§ rf\_settings = settings.get\_algorithm\_settings("RANDOM\_FOREST\_CLASSIFICATION")

§ # rf\_settings is an object representing the settings for this algorithm.

§ # The 'enabled' attribute indicates whether this algorithm will be trained.

§ # Other attributes are the various hyperparameters of the algorithm.

§ # The precise hyperparameters for each algorithm are not all documented, so let's

§ # print the dictionary keys to see available hyperparameters.

§ # Alternatively, tab completion will provide relevant hints to available hyperparameters.

§ print(rf\_settings.keys())

§ # Let's first have a look at rf\_settings.n\_estimators which is a multi-valued hyperparameter

§ # represented as a NumericalHyperparameterSettings object

§ print(rf\_settings.n\_estimators)

§ # Set multiple explicit values for "n\_estimators" to be explored during the search

§ rf\_settings.n\_estimators.definition\_mode = "EXPLICIT"

§ rf\_settings.n\_estimators.values = [100, 200]

§ # Alternatively use the set\_values setter

§ rf\_settings.n\_estimators.set\_values([100, 200])

§ # Set a range of values for "n\_estimators" to be explored during the search

§ rf\_settings.n\_estimators.definition\_mode = "RANGE"

§ rf\_settings.n\_estimators.range.min = 10

§ rf\_settings.n\_estimators.range.max = 100

§ rf\_settings.n\_estimators.range.nb\_values = 5  # Only relevant for grid-search

§ # Alternatively, use the set\_range setter

§ rf\_settings.n\_estimators.set\_range(min=10, max=100, nb\_values=5)

§ # Let's now have a look at rf\_settings.selection\_mode which is a single-valued hyperparameter

§ # represented as a SingleCategoryHyperparameterSettings object.

§ # The object stores the valid options for this hyperparameter.

§ print(rf\_settings.selection\_mode)

§ # Features selection mode is not multi-valued so it's not actually searched during the

§ # hyperparameter search

§ rf\_settings.selection\_mode = "sqrt"

§ # Save the settings

§ settings.save()

The next sample shows how to modify the settings of the Logistic Regression classification algorithm, where a new kind of hyperparameter (multi-valued categorical) is introduced.

§ # mltask is a DSSMLTask object

§ settings = mltask.get\_settings()

§ logit\_settings = settings.get\_algorithm\_settings("LOGISTIC\_REGRESSION")

§ # Let's have a look at logit\_settings.penalty which is a multi-valued categorical

§ # hyperparameter represented as a CategoricalHyperparameterSettings object

§ print(logit\_settings.penalty)

§ # List currently enabled values

§ print(logit\_settings.penalty.get\_values())

§ # List all possible values

§ print(logit\_settings.penalty.get\_all\_possible\_values())

§ # Set the values for the "penalty" hyperparameter to be explored during the search

§ logit\_settings.penalty = ["l1", "l2"]

§ # Alternatively use the set\_values setter

§ logit\_settings.penalty.set\_values(["l1", "l2"])

§ # Save the settings

§ settings.save()

### Exporting a model documentation[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#exporting-a-model-documentation "Permalink to this headline")

This sample shows how to generate and download a model documentation from a template.

See Model Document Generator for more information.

§ # mltask is a DSSMLTask object

§ details = mltask.get\_trained\_model\_details(id)

§ # Launch the model document generation by either

§ # using the default template for this model by calling without argument

§ # or specifying a managed folder id and the path to the template to use in that folder

§ future = details.generate\_documentation(FOLDER\_ID, "path/my\_template.docx")

§ # Alternatively, use a custom uploaded template file

§ with open("my\_template.docx", "rb") as f:

§ future = details.generate\_documentation\_from\_custom\_template(f)

§ # Wait for the generation to finish, retrieve the result and download the generated

§ # model documentation to the specified file

§ result = future.wait\_for\_result()

§ export\_id = result["exportId"]

§ details.download\_documentation\_to\_file(export\_id, "path/my\_model\_documentation.docx")

## API Reference[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#api-reference "Permalink to this headline")

### Interaction with a ML Task[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#interaction-with-a-ml-task "Permalink to this headline")

*class* `dataikuapi.dss.ml.``DSSMLTask`(*client*, *project\_key*, *analysis\_id*, *mltask\_id*)

*static* `from_full_model_id`(*client*, *fmi*, *project\_key=None*)

`delete`()

Delete the present ML task

`wait_guess_complete`()

Waits for guess to be complete. This should be called immediately after the creation of a new ML Task (if the ML Task was created with wait\_guess\_complete=False), before calling `get\_settings` or `train`

`get_status`()

Gets the status of this ML Task

* Returns: a dict

`get_settings`()

Gets the settings of this ML Tasks

* Returns: a DSSMLTaskSettings object to interact with the settings

* Return type: `dataikuapi.dss.ml.DSSMLTaskSettings`

`train`(*session\_name=None*, *session\_description=None*, *run\_queue=False*)

Trains models for this ML Task

* Parameters: * **session\_name** (*str*) – name for the session
* **session\_description** (*str*) – description for the session

This method waits for train to complete. If you want to train asynchronously, use `start\_train()` and `wait\_train\_complete()`

This method returns the list of trained model identifiers. It returns models that have been trained for this train session, not all trained models for this ML task. To get all identifiers for all models trained across all training sessions, use `get\_trained\_models\_ids()`

These identifiers can be used for `get\_trained\_model\_snippet()`, `get\_trained\_model\_details()` and `deploy\_to\_flow()`

* Returns: A list of model identifiers

* Return type: list of strings

`ensemble`(*model\_ids=None*, *method=None*)

Create an ensemble model of a set of models

* Parameters: * **model\_ids** (*list*) – A list of model identifiers (defaults to [])
* **method** (*str*) – the ensembling method. One of: AVERAGE, PROBA\_AVERAGE, MEDIAN, VOTE, LINEAR\_MODEL, LOGISTIC\_MODEL

This method waits for the ensemble train to complete. If you want to train asynchronously, use `start\_ensembling()` and `wait\_train\_complete()`

This method returns the identifier of the trained ensemble. To get all identifiers for all models trained across all training sessions, use `get\_trained\_models\_ids()`

This identifier can be used for `get\_trained\_model\_snippet()`, `get\_trained\_model\_details()` and `deploy\_to\_flow()`

* Returns: A model identifier

* Return type: string

`start_train`(*session\_name=None*, *session\_description=None*, *run\_queue=False*)

Starts asynchronously a new train session for this ML Task.

* Parameters: * **session\_name** (*str*) – name for the session
* **session\_description** (*str*) – description for the session

This returns immediately, before train is complete. To wait for train to complete, use `wait\_train\_complete()`

`start_ensembling`(*model\_ids=None*, *method=None*)

Creates asynchronously a new ensemble models of a set of models.

* Parameters: * **model\_ids** (*list*) – A list of model identifiers (defaults to [])
* **method** (*str*) – the ensembling method (AVERAGE, PROBA\_AVERAGE, MEDIAN, VOTE, LINEAR\_MODEL, LOGISTIC\_MODEL)

This returns immediately, before train is complete. To wait for train to complete, use `wait\_train\_complete()`

* Returns: the model identifier of the ensemble

* Return type: string

`wait_train_complete`()

Waits for train to be complete (if started with `start\_train()`)

`get_trained_models_ids`(*session\_id=None*, *algorithm=None*)

Gets the list of trained model identifiers for this ML task.

These identifiers can be used for `get\_trained\_model\_snippet()` and `deploy\_to\_flow()`

* Returns: A list of model identifiers

* Return type: list of strings

`get_trained_model_snippet`(*id=None*, *ids=None*)

Gets a quick summary of a trained model, as a dict. For complete information and a structured object, use `get\_trained\_model\_detail()`

* Parameters: * **id** (*str*) – a model id
* **ids** (*list*) – a list of model ids

* Return type: dict

`get_trained_model_details`(*id*)

Gets details for a trained model

* Parameters: **id** (*str*) – Identifier of the trained model, as returned by `get\_trained\_models\_ids()`

* Returns: A `DSSTrainedPredictionModelDetails` or `DSSTrainedClusteringModelDetails` representing the details of this trained model id

* Return type: `DSSTrainedPredictionModelDetails` or `DSSTrainedClusteringModelDetails`

`delete_trained_model`(*model\_id*)

Deletes a trained model

* Parameters: **model\_id** (*str*) – Model identifier, as returend by `get\_trained\_models\_ids()`

`train_queue`()

Trains this MLTask’s queue

* Returns: A dict including the next sessionID to be trained in the queue

:rtype dict

`deploy_to_flow`(*model\_id*, *model\_name*, *train\_dataset*, *test\_dataset=None*, *redo\_optimization=True*)

Deploys a trained model from this ML Task to a saved model + train recipe in the Flow.

* Parameters: * **model\_id** (*str*) – Model identifier, as returned by `get\_trained\_models\_ids()`
* **model\_name** (*str*) – Name of the saved model to deploy in the Flow
* **train\_dataset** (*str*) – Name of the dataset to use as train set. May either be a short name or a PROJECT.name long name (when using a shared dataset)
* **test\_dataset** (*str*) – Name of the dataset to use as test set. If null, split will be applied to the train set. May either be a short name or a PROJECT.name long name (when using a shared dataset). Only for PREDICTION tasks
* **redo\_optimization** (*bool*) – Should the hyperparameters optimization phase be done ? Defaults to True. Only for PREDICTION tasks

* Returns: A dict containing: “savedModelId” and “trainRecipeName” - Both can be used to obtain further handles

* Return type: dict

`redeploy_to_flow`(*model\_id*, *recipe\_name=None*, *saved\_model\_id=None*, *activate=True*)

Redeploys a trained model from this ML Task to a saved model + train recipe in the Flow. Either  recipe\_name of saved\_model\_id need to be specified

* Parameters: * **model\_id** (*str*) – Model identifier, as returned by `get\_trained\_models\_ids()`
* **recipe\_name** (*str*) – Name of the training recipe to update
* **saved\_model\_id** (*str*) – Name of the saved model to update
* **activate** (*bool*) – Should the deployed model version become the active version

* Returns: A dict containing: “impactsDownstream” - whether the active version changed and downstream recipes are impacted

* Return type: dict

`remove_unused_splits`()

Deletes all stored splits data that are not anymore in use for this ML Task.

It is generally not needed to call this method

`remove_all_splits`()

Deletes all stored splits data for this ML Task. This operation saves disk space.

After performing this operation, it will not be possible anymore to: \* Ensemble already trained models \* View the “predicted data” or “charts” for already trained models \* Resume training of models for which optimization had been previously interrupted

Training new models remains possible

`guess`(*prediction\_type=None*, *reguess\_level=None*, *target\_variable=None*, *timeseries\_identifiers=None*, *time\_variable=None*, *full\_reguess=None*)

Reguess all the settings of the ML task when no optional parameter are given. For prediction ML tasks only, set a new value for a core parameter of the task (target variable or prediction type) and subsequently reguess the impacted settings.

* Parameters: * **prediction\_type** (*string*) – Only valid for prediction tasks of either BINARY\_CLASSIFICATION, MULTICLASS
or REGRESSION type, ignored otherwise. The prediction type to set.
Cannot be set if target\_variable, time\_variable, or timeseries\_identifiers is also specified.
* **target\_variable** (*string*) – Only valid for prediction tasks, ignored for clustering. The target variable to
set. Cannot be set if prediction\_type, time\_variable, or timeseries\_identifiers is also specified.
* **timeseries\_identifiers** (*list*) – Only valid for time series forecasting tasks. List of columns to be used as
time series identifiers.
Cannot be set if prediction\_type, target\_variable, or time\_variable is also specified.
* **time\_variable** (*string*) – Only valid for time series forecasting tasks. Column to be used as time variable.
Cannot be set if prediction\_type, target\_variable, or timeseries\_identifiers is also specified.
* **full\_reguess** (*bool*) – Only valid for prediction tasks, ignored for clustering. Scope of the reguess process:
whether it should reguess all the settings after changing a core parameter, or only reguess impacted
settings (e.g. target remapping when changing the target, metrics when changing the prediction type…).
Ignored if no core parameter is given. Defaults to true.
* **reguess\_level** (*string*) – Deprecated, use full\_reguess instead. Only valid for prediction tasks. Can be
one of the following values:
- TARGET\_CHANGE: Change the target if target\_variable is specified, reguess the target remapping, and
> 
> clear the model’s assertions if any.
> Equivalent to `full\_reguess`=False (recommended usage)
> 
> 
> 
	+ FULL\_REGUESS: All the settings of the ML task are reguessed.Equivalent to `full\_reguess`=True (recommended usage)

* FULL\_REGUESS: All the settings of the ML task are reguessed.: Equivalent to `full\_reguess`=True (recommended usage)

### Manipulation of settings[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#manipulation-of-settings "Permalink to this headline")

*class* `dataikuapi.dss.ml.``DSSMLTaskSettings`(*client*, *project\_key*, *analysis\_id*, *mltask\_id*, *mltask\_settings*)

Object to read and modify the settings of a ML task.

Do not create this object directly, use `DSSMLTask.get\_settings()` instead

`get_raw`()

Gets the raw settings of this ML Task. This returns a reference to the raw settings, not a copy, so changes made to the returned object will be reflected when saving.

* Return type: dict

`get_feature_preprocessing`(*feature\_name*)

Gets the feature preprocessing params for a particular feature. This returns a reference to the feature’s settings, not a copy, so changes made to the returned object will be reflected when saving

* Returns: A dict of the preprocessing settings for a feature

* Return type: dict

`foreach_feature`(*fn*, *only\_of\_type=None*)

Applies a function to all features (except target)

* Parameters: * **fn** (*function*) – Function that takes 2 parameters: feature\_name and feature\_params and returns modified feature\_params
* **only\_of\_type** (*str*) – if not None, only applies to feature of the given type. Can be one of `CATEGORY`, `NUMERIC`, `TEXT` or `VECTOR`

`reject_feature`(*feature\_name*)

Marks a feature as rejected and not used for training

* Parameters: **feature\_name** (*str*) – Name of the feature to reject

`use_feature`(*feature\_name*)

Marks a feature as input for training

* Parameters: **feature\_name** (*str*) – Name of the feature to reject

`get_algorithm_settings`(*algorithm\_name*)

`get_diagnostics_settings`()

Gets the diagnostics settings for a mltask. This returns a reference to the diagnostics’ settings, not a copy, so changes made to the returned object will be reflected when saving.

This method returns a dictionary of the settings with: - ‘enabled’: indicates if the diagnostics are enabled globally, if False, all diagnostics will be disabled - ‘settings’: a list of dict comprised of:

* ‘type’: the diagnostic type

* ‘enabled’: indicates if the diagnostic type is enabled, if False, all diagnostics of that type will be disabled

Please refer to the documentation for details on available diagnostics.

* Returns: A dict of diagnostics settings

* Return type: dict

`set_diagnostics_enabled`(*enabled*)

Globally enables or disables all diagnostics.

* Parameters: **enabled** (*bool*) – if the diagnostics should be enabled or not

`set_diagnostic_type_enabled`(*diagnostic\_type*, *enabled*)

Enables or disables a diagnostic based on its type.

Please refer to the documentation for details on available diagnostics.

* Parameters: * **diagnostic\_type** (*str*) – Name (in capitals) of the diagnostic type.
* **enabled** (*bool*) – if the diagnostic should be enabled or not

`set_algorithm_enabled`(*algorithm\_name*, *enabled*)

Enables or disables an algorithm based on its name.

Please refer to the documentation for details on available algorithms.

* Parameters: **algorithm\_name** (*str*) – Name (in capitals) of the algorithm.

`disable_all_algorithms`()

Disables all algorithms

`get_all_possible_algorithm_names`()

Returns the list of possible algorithm names, i.e. the list of valid identifiers for `set\_algorithm\_enabled()` and `get\_algorithm\_settings()`

This includes all possible algorithms, regardless of the prediction kind (regression/classification) or engine, so some algorithms may be irrelevant

* Returns: the list of algorithm names as a list of strings

* Return type: list of string

`get_enabled_algorithm_names`()

* Returns: the list of enabled algorithm names as a list of strings

* Return type: list of string

`get_enabled_algorithm_settings`()

* Returns: the map of enabled algorithm names with their settings

* Return type: dict

`set_metric`(*metric=None*, *custom\_metric=None*, *custom\_metric\_greater\_is\_better=True*, *custom\_metric\_use\_probas=False*)

Sets the score metric to optimize for a prediction ML Task

* Parameters: * **metric** (*str*) – metric to use. Leave empty to use a custom metric. You need to set the `custom\_metric` value in that case
* **custom\_metric** (*str*) – code of the custom metric
* **custom\_metric\_greater\_is\_better** (*bool*) – whether the custom metric is a score or a loss
* **custom\_metric\_use\_probas** (*bool*) – whether to use the classes’ probas or the predicted value (for classification)

`add_custom_python_model`(*name='Custom Python Model'*, *code=''*)

Adds a new custom python model

* Parameters: * **name** (*str*) – name of the custom model
* **code** (*str*) – code of the custom model

`add_custom_mllib_model`(*name='Custom MLlib Model'*, *code=''*)

Adds a new custom MLlib model

* Parameters: * **name** (*str*) – name of the custom model
* **code** (*str*) – code of the custom model

`save`()

Saves back these settings to the ML Task

*class* `dataikuapi.dss.ml.``DSSPredictionMLTaskSettings`(*client*, *project\_key*, *analysis\_id*, *mltask\_id*, *mltask\_settings*)

*class* `PredictionTypes`

* 
`BINARY` *= 'BINARY\_CLASSIFICATION'*:

* 
`REGRESSION` *= 'REGRESSION'*:

* 
`MULTICLASS` *= 'MULTICLASS'*:

`get_all_possible_algorithm_names`()

Returns the list of possible algorithm names, i.e. the list of valid identifiers for `set\_algorithm\_enabled()` and `get\_algorithm\_settings()`

This includes all possible algorithms, regardless of the prediction kind (regression/classification) or engine, so some algorithms may be irrelevant

* Returns: the list of algorithm names as a list of strings

* Return type: list of string

`get_enabled_algorithm_names`()

* Returns: the list of enabled algorithm names as a list of strings

* Return type: list of string

`get_algorithm_settings`(*algorithm\_name*)

Gets the training settings for a particular algorithm. This returns a reference to the algorithm’s settings, not a copy, so changes made to the returned object will be reflected when saving.

This method returns the settings for this algorithm as an PredictionAlgorithmSettings (extended dict). All algorithm dicts have at least an “enabled” property/key in the settings. The “enabled” property/key indicates whether this algorithm will be trained.

Other settings are algorithm-dependent and are the various hyperparameters of the algorithm. The precise properties/keys for each algorithm are not all documented. You can print the returned AlgorithmSettings to learn more about the settings of each particular algorithm.

Please refer to the documentation for details on available algorithms.

* Parameters: **algorithm\_name** (*str*) – Name (in capitals) of the algorithm.

* Returns: A PredictionAlgorithmSettings (extended dict) for one of the built-in prediction algorithms

* Return type: PredictionAlgorithmSettings

`split_ordered_by`(*feature\_name*, *ascending=True*)

Deprecated. Use split\_params.set\_time\_ordering()

`remove_ordered_split`()

Deprecated. Use split\_params.unset\_time\_ordering()

`use_sample_weighting`(*feature\_name*)

Deprecated. use set\_weighting()

`set_weighting`(*method*, *feature\_name=None*)

Sets the method to weight samples.

If there was a WEIGHT feature declared previously, it will be set back as an INPUT feature first.

* Parameters: * **method** (*str*) – Method to use. One of NO\_WEIGHTING, SAMPLE\_WEIGHT (must give a feature name),
CLASS\_WEIGHT or CLASS\_AND\_SAMPLE\_WEIGHT (must give a feature name)
* **feature\_name** (*str*) – Name of the feature to use as sample weight

`remove_sample_weighting`()

Deprecated. Use set\_weighting(method=”NO\_WEIGHTING”) instead

`get_assertions_params`()

Retrieves the assertions parameters for this ml task

* Return type: `DSSMLAssertionsParams`

*class* `dataikuapi.dss.ml.``DSSClusteringMLTaskSettings`(*client*, *project\_key*, *analysis\_id*, *mltask\_id*, *mltask\_settings*)

`get_algorithm_settings`(*algorithm\_name*)

Gets the training settings for a particular algorithm. This returns a reference to the algorithm’s settings, not a copy, so changes made to the returned object will be reflected when saving.

This method returns a dictionary of the settings for this algorithm. All algorithm dicts have at least an “enabled” key in the dictionary. The ‘enabled’ key indicates whether this algorithm will be trained

Other settings are algorithm-dependent and are the various hyperparameters of the algorithm. The precise keys for each algorithm are not all documented. You can print the returned dictionary to learn more about the settings of each particular algorithm

Please refer to the documentation for details on available algorithms.

* Param: algorithm\_name: Name of the algorithm (uppercase).

* Type: algorithm\_name: str

* Returns: A dict of the settings for an algorithm

* Return type: dict

*class* `dataikuapi.dss.ml.``DSSTimeseriesForecastingMLTaskSettings`(*client*, *project\_key*, *analysis\_id*, *mltask\_id*, *mltask\_settings*)

*class* `PredictionTypes`

* 
`TIMESERIES_FORECAST` *= 'TIMESERIES\_FORECAST'*:

`get_time_step_params`()

Gets the time step parameters for the time series forecasting task. This returns a reference to the time step parameters, not a copy, so changes made to the returned object will be reflected when saving

* Returns: A dict of the time step parameters

* Return type: dict

`set_time_step`(*time\_unit=None*, *n\_time\_units=None*, *end\_of\_week\_day=None*, *reguess=True*, *update\_algorithm\_settings=True*)

Sets the time step parameters for the time series forecasting task.

* Parameters: * **time\_unit** (*str*) – time unit for forecasting step. Valid values are: MILLISECOND, SECOND, MINUTE, HOUR, DAY,
BUSINESS\_DAY, WEEK, MONTH, QUARTER, HALF\_YEAR, YEAR
* **n\_time\_units** (*int*) – number of time units within a time step
* **end\_of\_week\_day** (*int*) – only useful for the WEEK time unit. Valid values are: 1 (Sunday), 2 (Monday), …, 7 (Saturday)
* **reguess** (*bool*) – Defaults to true. Whether to reguess the ML task settings after changing the time step params
* **update\_algorithm\_settings** (*bool*) – Defaults to true. Whether the algorithm settings should be reguessed after changing time step parameters.

* Returns:

`get_resampling_params`()

Gets the time series resampling parameters for the time series forecasting task. This returns a reference to the time series resampling parameters, not a copy, so changes made to the returned object will be reflected when saving

* Returns: A dict of the resampling parameters

* Return type: dict

`set_numerical_interpolation`(*method=None*, *constant=None*)

Sets the time series resampling numerical interpolation parameters

* Parameters: * **method** (*str*) – Interpolation method. Valid values are: NEAREST, PREVIOUS, NEXT, LINEAR, QUADRATIC, CUBIC, CONSTANT
* **constant** (*float*) – Value for the CONSTANT interpolation method

* Returns:

`set_numerical_extrapolation`(*method=None*, *constant=None*)

Sets the time series resampling numerical extrapolation parameters

* Parameters: * **method** (*str*) – Extrapolation method. Valid values are: PREVIOUS\_NEXT, NO\_EXTRAPOLATION, CONSTANT, LINEAR, QUADRATIC, CUBIC
* **constant** (*float*) – Value for the CONSTANT extrapolation method

* Returns:

`set_categorical_imputation`(*method=None*, *constant=None*)

Sets the time series resampling categorical imputation parameters

* Parameters: * **method** (*str*) – Imputation method. Valid values are: MOST\_COMMON, NULL, CONSTANT, PREVIOUS\_NEXT, PREVIOUS, NEXT
* **constant** (*str*) – Value for the CONSTANT imputation method

* Returns:

`set_duplicate_timestamp_handling`(*method*)

Sets the time series duplicate timestamp handling

* Parameters: **method** (*str*) – Duplicate timestamp handling method. Valid values are: FAIL\_IF\_CONFLICTING, DROP\_IF\_CONFLICTING, MEAN\_MODE

*property* `forecast_horizon`

* Returns: Number of time steps to be forecast

* Return type: int

`set_forecast_horizon`(*forecast\_horizon*, *reguess=True*, *update\_algorithm\_settings=True*)

* Parameters: * **forecast\_horizon** (*int*) – Number of time steps to be forecast
* **reguess** (*bool*) – Defaults to true. Whether to reguess the ML task settings after changing the forecast horizon
* **update\_algorithm\_settings** (*bool*) – Defaults to true. Whether the algorithm settings should be reguessed after the forecast horizon.

*property* `evaluation_gap`

* Returns: Number of skipped time steps for evaluation

* Return type: int

*property* `time_variable`

* Returns: Feature used as time variable (read-only)

* Return type: str

*property* `timeseries_identifiers`

* Returns: Features used as time series identifiers (read-only copy)

* Return type: list

*property* `quantiles_to_forecast`

* Returns: List of quantiles to forecast

* Return type: list

*class* `dataikuapi.dss.ml.``PredictionSplitParamsHandler`(*mltask\_settings*)

Object to modify the train/test splitting params.

* 
`SPLIT_PARAMS_KEY` *= 'splitParams'*:

`get_raw`()

Gets the raw settings of the prediction split configuration. This returns a reference to the raw settings, not a copy, so changes made to the returned object will be reflected when saving.

* Return type: dict

`set_split_random`(*train\_ratio=0.8*, *selection=None*, *dataset\_name=None*)

Sets the train/test split to random splitting of an extract of a single dataset

* Parameters: * **train\_ratio** (*float*) – Ratio of rows to use for train set. Must be between 0 and 1
* **selection** (*object*) – A `DSSDatasetSelectionBuilder` to build the settings of the extract of the dataset. May be None (won’t be changed)
* **dataset\_name** (*str*) – Name of dataset to split. If None, the main dataset used to create the visual analysis will be used.

`set_split_kfold`(*n\_folds=5*, *selection=None*, *dataset\_name=None*)

Sets the train/test split to k-fold splitting of an extract of a single dataset

* Parameters: * **n\_folds** (*int*) – number of folds. Must be greater than 0
* **selection** (*object*) – A `DSSDatasetSelectionBuilder` to build the settings of the extract of the dataset. May be None (won’t be changed)
* **dataset\_name** (*str*) – Name of dataset to split. If None, the main dataset used to create the visual analysis will be used.

`set_split_explicit`(*train\_selection*, *test\_selection*, *dataset\_name=None*, *test\_dataset\_name=None*, *train\_filter=None*, *test\_filter=None*)

Sets the train/test split to explicit extract of one or two dataset(s)

* Parameters: * **train\_selection** (*object*) – A `DSSDatasetSelectionBuilder` to build the settings of the extract of the train dataset. May be None (won’t be changed)
* **test\_selection** (*object*) – A `DSSDatasetSelectionBuilder` to build the settings of the extract of the test dataset. May be None (won’t be changed)
* **dataset\_name** (*str*) – Name of dataset to use for the extracts. If None, the main dataset used to create the ML Task will be used.
* **test\_dataset\_name** (*str*) – Name of a second dataset to use for the test data extract. If None, both extracts are done from dataset\_name
* **train\_filter** (*object*) – A `DSSFilterBuilder` to build the settings of the filter of the train dataset. May be None (won’t be changed)
* **test\_filter** (*object*) – A `DSSFilterBuilder` to build the settings of the filter of the test dataset. May be None (won’t be changed)

`set_time_ordering`(*feature\_name*, *ascending=True*)

Uses a variable to sort the data for train/test split and hyperparameter optimization by time

* Parameters: * **feature\_name** (*str*) – Name of the variable to use
* **ascending** (*bool*) – True iff the test set is expected to have larger time values than the train set

`unset_time_ordering`()

Remove time-based ordering for train/test split and hyperparameter optimization

`has_time_ordering`()

* Returns: whether the splitting uses time ordering

* Return type: bool

`get_time_ordering_variable`()

* Returns: the name of the variable

* Return type: str

`is_time_ordering_ascending`()

* Returns: True if the ordering is set to be ascending with respect to the time-ordering variable

* Return type: bool

### Exploration of results[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#exploration-of-results "Permalink to this headline")

*class* `dataikuapi.dss.ml.``DSSTrainedPredictionModelDetails`(*details*, *snippet*, *saved\_model=None*, *saved\_model\_version=None*, *mltask=None*, *mltask\_model\_id=None*)

Object to read details of a trained prediction model

Do not create this object directly, use `DSSMLTask.get\_trained\_model\_details()` instead

`get_roc_curve_data`()

`get_performance_metrics`()

Returns all performance metrics for this model.

For binary classification model, this includes both “threshold-independent” metrics like AUC and “threshold-dependent” metrics like precision. Threshold-dependent metrics are returned at the threshold value that was found to be optimal during training.

To get access to the per-threshold values, use the following:

§ # Returns a list of tested threshold values

§ details.get\_performance()["perCutData"]["cut"]

§ # Returns a list of F1 scores at the tested threshold values

§ details.get\_performance()["perCutData"]["f1"]

§ # Both lists have the same length

If K-Fold cross-test was used, most metrics will have a “std” variant, which is the standard deviation accross the K cross-tested folds. For example, “auc” will be accompanied with “aucstd”

* Returns: a dict of performance metrics values

* Return type: dict

`get_assertions_metrics`()

Retrieves assertions metrics computed for this trained model

* Returns: an object representing assertion metrics

* Return type: `DSSMLAssertionsMetrics`

`get_hyperparameter_search_points`()

Gets the list of points in the hyperparameter search space that have been tested.

Returns a list of dict. Each entry in the list represents a point.

* For each point, the dict contains at least:: * “score”: the average value of the optimization metric over all the folds at this point
* “params”: a dict of the parameters at this point. This dict has the same structure as the params of the best parameters

* “params”: a dict of the parameters at this point. This dict has the same structure : as the params of the best parameters

`get_preprocessing_settings`()

Gets the preprocessing settings that were used to train this model

* Return type: dict

`get_modeling_settings`()

Gets the modeling (algorithms) settings that were used to train this model.

Note: the structure of this dict is not the same as the modeling params on the ML Task (which may contain several algorithm)

* Return type: dict

`get_actual_modeling_params`()

Gets the actual / resolved parameters that were used to train this model, post hyperparameter optimization.

* Returns: A dictionary, which contains at least a “resolved” key, which is a dict containing the post-optimization parameters

* Return type: dict

`get_trees`()

Gets the trees in the model (for tree-based models)

* Returns: a DSSTreeSet object to interact with the trees

* Return type: `dataikuapi.dss.ml.DSSTreeSet`

`get_coefficient_paths`()

Gets the coefficient paths for Lasso models

* Returns: a DSSCoefficientPaths object to interact with the coefficient paths

* Return type: `dataikuapi.dss.ml.DSSCoefficientPaths`

`get_scoring_jar_stream`(*model\_class='model.Model'*, *include\_libs=False*)

Get a scoring jar for this trained model, provided that you have the license to do so and that the model is compatible with optimized scoring. You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.

* Parameters: * **model\_class** (*str*) – fully-qualified class name, e.g. “com.company.project.Model”
* **include\_libs** (*bool*) – if True, also packs the required dependencies;
if False, runtime will require the scoring libs given by `DSSClient.scoring\_libs()`

* Returns: a jar file, as a stream

* Return type: file-like

`get_scoring_pmml_stream`()

Get a scoring PMML for this trained model, provided that you have the license to do so and that the model is compatible with PMML scoring You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.

* Returns: a PMML file, as a stream

* Return type: file-like

`get_scoring_python_stream`()

Download the zip containing data to use for this trained model, provided that you have the license to do so and that the model is compatible with Python scoring. You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.

* Returns: an archive file, as a stream

* Return type: file-like

`get_scoring_python`(*filename*)

Download the zip containing data to use Python scoring for this trained model in filename, provided that you have the license to do so and that the model is compatible with Python scoring.

* Parameters: **filename** (*str*) – filename of the resulting downloaded file

`get_scoring_mlflow_stream`()

Download the zip containing this trained model using MLflow Model format, provided that you have the license to do so and that the model is compatible with MLflow scoring. You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.

* Returns: an archive file, as a stream

* Return type: file-like

`get_scoring_mlflow`(*filename*)

Download the zip containing data for this trained model, using MLflow Model format, provided that you have the license to do so and that the model is compatible with MLflow scoring

* Parameters: **filename** (*str*) – filename to the resulting MLflow Model zip

`compute_subpopulation_analyses`(*split\_by*, *wait=True*, *sample\_size=1000*, *random\_state=1337*, *n\_jobs=1*, *debug\_mode=False*)

Launch computation of Subpopulation analyses for this trained model.

* Parameters: * **split\_by** (*list|str*) – column(s) on which subpopulation analyses are to be computed (one analysis per column)
* **wait** (*bool*) – if True, the call blocks until the computation is finished and returns the results directly
* **sample\_size** (*int*) – number of records of the dataset to use for the computation
* **random\_state** (*int*) – random state to use to build sample, for reproducibility
* **n\_jobs** (*int*) – number of cores used for parallel training. (-1 means ‘all cores’)
* **debug\_mode** (*bool*) – if True, output all logs (slower)

* Returns: if wait is True, an object containing the Subpopulation analyses, else a future to wait on the result

* Return type: `dataikuapi.dss.ml.DSSSubpopulationAnalyses` or `dataikuapi.dss.future.DSSFuture`

`get_subpopulation_analyses`()

Retrieve all subpopulation analyses computed for this trained model

* Returns: the subpopulation analyses

* Return type: `dataikuapi.dss.ml.DSSSubpopulationAnalyses`

`compute_partial_dependencies`(*features*, *wait=True*, *sample\_size=1000*, *random\_state=1337*, *n\_jobs=1*, *debug\_mode=False*)

Launch computation of Partial dependencies for this trained model.

* Parameters: * **features** (*list|str*) – feature(s) on which partial dependencies are to be computed
* **wait** (*bool*) – if True, the call blocks until the computation is finished and returns the results directly
* **sample\_size** (*int*) – number of records of the dataset to use for the computation
* **random\_state** (*int*) – random state to use to build sample, for reproducibility
* **n\_jobs** (*int*) – number of cores used for parallel training. (-1 means ‘all cores’)
* **debug\_mode** (*bool*) – if True, output all logs (slower)

* Returns: if wait is True, an object containing the Partial dependencies, else a future to wait on the result

* Return type: `dataikuapi.dss.ml.DSSPartialDependencies` or `dataikuapi.dss.future.DSSFuture`

`get_partial_dependencies`()

Retrieve all partial dependencies computed for this trained model

* Returns: the partial dependencies

* Return type: `dataikuapi.dss.ml.DSSPartialDependencies`

`download_documentation_stream`(*export\_id*)

Download a model documentation, as a binary stream.

Warning: this stream will monopolize the DSSClient until closed.

* Parameters: **export\_id** – the id of the generated model documentation returned as the result of the future

* Returns: A `DSSFuture` representing the model document generation process

`download_documentation_to_file`(*export\_id*, *path*)

Download a model documentation into the given output file.

* Parameters: * **export\_id** – the id of the generated model documentation returned as the result of the future
* **path** – the path where to download the model documentation

* Returns: None

*property* `full_id`

`generate_documentation`(*folder\_id=None*, *path=None*)

Start the model document generation from a template docx file in a managed folder, or from the default template if no folder id and path are specified.

* Parameters: * **folder\_id** – (optional) the id of the managed folder
* **path** – (optional) the path to the file from the root of the folder

* Returns: A `DSSFuture` representing the model document generation process

`generate_documentation_from_custom_template`(*fp*)

Start the model document generation from a docx template (as a file object).

* Parameters: **fp** (*object*) – A file-like object pointing to a template docx file

* Returns: A `DSSFuture` representing the model document generation process

`get_diagnostics`()

Retrieves diagnostics computed for this trained model

* Returns: list of diagnostics

* Return type: list of type dataikuapi.dss.ml.DSSMLDiagnostic

`get_origin_analysis_trained_model`()

Fetch details about the model in an analysis, this model has been exported from. Returns None if the deployed trained model does not have an origin analysis trained model.

* Return type: DSSTrainedModelDetails | None

`get_raw`()

Gets the raw dictionary of trained model details

`get_raw_snippet`()

Gets the raw dictionary of trained model snippet.  The snippet is a lighter version than the details.

`get_train_info`()

Returns various information about the train process (size of the train set, quick description, timing information)

* Return type: dict

`get_user_meta`()

Gets the user-accessible metadata (name, description, cluster labels, classification threshold) Returns the original object, not a copy. Changes to the returned object are persisted to DSS by calling `save\_user\_meta()`

`save_user_meta`()

*class* `dataikuapi.dss.ml.``DSSTrainedClusteringModelDetails`(*details*, *snippet*, *saved\_model=None*, *saved\_model\_version=None*, *mltask=None*, *mltask\_model\_id=None*)

Object to read details of a trained clustering model

Do not create this object directly, use `DSSMLTask.get\_trained\_model\_details()` instead

`get_raw`()

Gets the raw dictionary of trained model details

`get_train_info`()

Returns various information about the train process (size of the train set, quick description, timing information)

* Return type: dict

`get_facts`()

Gets the ‘cluster facts’ data, i.e. the structure behind the screen “for cluster X, average of Y is Z times higher than average

* Return type: `DSSClustersFacts`

`get_performance_metrics`()

Returns all performance metrics for this clustering model.

* Returns: a dict of performance metrics values

* Return type: dict

`get_preprocessing_settings`()

Gets the preprocessing settings that were used to train this model

* Return type: dict

`get_modeling_settings`()

Gets the modeling (algorithms) settings that were used to train this model.

Note: the structure of this dict is not the same as the modeling params on the ML Task (which may contain several algorithm)

* Return type: dict

`get_actual_modeling_params`()

Gets the actual / resolved parameters that were used to train this model.

* Returns: A dictionary, which contains at least a “resolved” key

* Return type: dict

`get_scatter_plots`()

Gets the cluster scatter plot data

* Returns: a DSSScatterPlots object to interact with the scatter plots

* Return type: `dataikuapi.dss.ml.DSSScatterPlots`

`download_documentation_stream`(*export\_id*)

Download a model documentation, as a binary stream.

Warning: this stream will monopolize the DSSClient until closed.

* Parameters: **export\_id** – the id of the generated model documentation returned as the result of the future

* Returns: A `DSSFuture` representing the model document generation process

`download_documentation_to_file`(*export\_id*, *path*)

Download a model documentation into the given output file.

* Parameters: * **export\_id** – the id of the generated model documentation returned as the result of the future
* **path** – the path where to download the model documentation

* Returns: None

*property* `full_id`

`generate_documentation`(*folder\_id=None*, *path=None*)

Start the model document generation from a template docx file in a managed folder, or from the default template if no folder id and path are specified.

* Parameters: * **folder\_id** – (optional) the id of the managed folder
* **path** – (optional) the path to the file from the root of the folder

* Returns: A `DSSFuture` representing the model document generation process

`generate_documentation_from_custom_template`(*fp*)

Start the model document generation from a docx template (as a file object).

* Parameters: **fp** (*object*) – A file-like object pointing to a template docx file

* Returns: A `DSSFuture` representing the model document generation process

`get_diagnostics`()

Retrieves diagnostics computed for this trained model

* Returns: list of diagnostics

* Return type: list of type dataikuapi.dss.ml.DSSMLDiagnostic

`get_origin_analysis_trained_model`()

Fetch details about the model in an analysis, this model has been exported from. Returns None if the deployed trained model does not have an origin analysis trained model.

* Return type: DSSTrainedModelDetails | None

`get_raw_snippet`()

Gets the raw dictionary of trained model snippet.  The snippet is a lighter version than the details.

`get_user_meta`()

Gets the user-accessible metadata (name, description, cluster labels, classification threshold) Returns the original object, not a copy. Changes to the returned object are persisted to DSS by calling `save\_user\_meta()`

`save_user_meta`()

### Saved models[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#saved-models "Permalink to this headline")

*class* `dataikuapi.dss.savedmodel.``DSSSavedModel`(*client*, *project\_key*, *sm\_id*)

A handle to interact with a saved model on the DSS instance.

Do not create this directly, use `dataikuapi.dss.DSSProject.get\_saved\_model()`

*property* `id`

`get_settings`()

Returns the settings of this saved model.

* Return type: DSSSavedModelSettings

`list_versions`()

Get the versions of this saved model

* Returns: a list of the versions, as a dict of object. Each object contains at least a “id” parameter, which can be passed to `get\_metric\_values()`, `get\_version\_details()` and `set\_active\_version()`

* Return type: list

`get_active_version`()

Gets the active version of this saved model

* Returns: a dict representing the active version or None if no version is active. The dict contains at least a “id” parameter, which can be passed to `get\_metric\_values()`, `get\_version\_details()` and `set\_active\_version()`

* Return type: dict

`get_version_details`(*version\_id*)

Gets details for a version of a saved model

* Parameters: **version\_id** (*str*) – Identifier of the version, as returned by `list\_versions()`

* Returns: A `DSSTrainedPredictionModelDetails` representing the details of this trained model id

* Return type: `DSSTrainedPredictionModelDetails`

`set_active_version`(*version\_id*)

Sets a particular version of the saved model as the active one

`delete_versions`(*versions*, *remove\_intermediate=True*)

Delete version(s) of the saved model

* Parameters: * **versions** (*list**[**str**]*) – list of versions to delete
* **remove\_intermediate** – also remove intermediate versions (default: True). In the case of a partitioned

model, an intermediate version is created every time a partition has finished training. :type remove\_intermediate: bool

`get_origin_ml_task`()

Fetch the last ML task that has been exported to this saved model. Returns None if the saved model does not have an origin ml task.

* Return type: DSSMLTask | None

`import_mlflow_version_from_path`(*version\_id*, *path*, *code\_env\_name='INHERIT'*, *container\_exec\_config\_name='NONE'*, *set\_active=True*, *binary\_classification\_threshold=0.5*)

Create a new version for this saved model from a path containing a MLFlow model.

Requires the saved model to have been created using `dataikuapi.dss.project.DSSProject.create\_mlflow\_pyfunc\_model()`.

* Parameters: * **version\_id** (*str*) – Identifier of the version to create
* **path** (*str*) – An absolute path on the local filesystem. Must be a folder, and must contain a MLFlow model
* **code\_env\_name** (*str*) – Name of the code env to use for this model version. The code env must contain at least
mlflow and the package(s) corresponding to the used MLFlow-compatible frameworks.
If value is “INHERIT”, the default active code env of the project will be used
* **container\_exec\_config\_name** (*str*) – Name of the containerized execution configuration to use while creating
this model version.
If value is “INHERIT”, the container execution configuration of the project will be used.
If value is “NONE”, local execution will be used (no container)
* **set\_active** (*bool*) – sets this new version as the active version of the saved model
* **binary\_classification\_threshold** (*float*) – For binary classification, define the actual threshold for the imported version. Default to 0.5

:return a :class:ExternalModelVersionHandler in order to interact with the new MLFlow model version

`import_mlflow_version_from_managed_folder`(*version\_id*, *managed\_folder*, *path*, *code\_env\_name='INHERIT'*, *container\_exec\_config\_name='INHERIT'*, *set\_active=True*, *binary\_classification\_threshold=0.5*)

Create a new version for this saved model from a path containing a MLFlow model in a managed folder.

Requires the saved model to have been created using `dataikuapi.dss.project.DSSProject.create\_mlflow\_pyfunc\_model()`.

* Parameters: * **version\_id** (*str*) – Identifier of the version to create
* **managed\_folder** (*str*) – Identifier of the managed folder or dataikuapi.dss.managedfolder.DSSManagedFolder
* **path** (*str*) – Path of the MLflow folder in the managed folder
* **code\_env\_name** (*str*) – Name of the code env to use for this model version. The code env must contain at least
mlflow and the package(s) corresponding to the used MLFlow-compatible frameworks.
If value is “INHERIT”, the default active code env of the project will be used
* **container\_exec\_config\_name** (*str*) – Name of the containerized execution configuration to use for evaluating
this model version.
If value is “INHERIT”, the container execution configuration of the project will be used.
If value is “NONE”, local execution will be used (no container)
* **set\_active** (*bool*) – sets this new version as the active version of the saved model
* **binary\_classification\_threshold** (*float*) – For binary classification, define the actual threshold for the imported version. Default to 0.5

:return a `ExternalModelVersionHandler` in order to interact with the new MLFlow model version

`create_proxy_model_version`(*version\_id*, *protocol*, *configuration*)

EXPERIMENTAL. Creates a new version of a proxy model.

This is an experimental API, subject to change. Requires the saved model to have been created using `dataikuapi.dss.project.DSSProject.create\_proxy\_model()`. :param str version\_id: Identifier of the version to create :param str protocol: one of [“KServe”, “DSS\_API\_NODE”] :param dict configuration: A dictionary containing the required params for the selected protocol :return a :class:ExternalModelVersionHandler in order to interact with the new Proxy model version

`get_external_model_version_handler`(*version\_id*)

Returns a :class:ExternalModelVersionHandler to interact with an External model version (MLflow or Proxy model)

`get_metric_values`(*version\_id*)

Get the values of the metrics on the version of this saved model

* Returns:: a list of metric objects and their value

`get_zone`()

Gets the flow zone of this saved model

* Return type: `dataikuapi.dss.flow.DSSFlowZone`

`move_to_zone`(*zone*)

Moves this object to a flow zone

* Parameters: **zone** (*object*) – a `dataikuapi.dss.flow.DSSFlowZone` where to move the object

`share_to_zone`(*zone*)

Share this object to a flow zone

* Parameters: **zone** (*object*) – a `dataikuapi.dss.flow.DSSFlowZone` where to share the object

`unshare_from_zone`(*zone*)

Unshare this object from a flow zone

* Parameters: **zone** (*object*) – a `dataikuapi.dss.flow.DSSFlowZone` from where to unshare the object

`get_usages`()

Get the recipes referencing this model

* Returns:: a list of usages

`get_object_discussions`()

Get a handle to manage discussions on the saved model

* Returns: the handle to manage discussions

* Return type: `dataikuapi.discussion.DSSObjectDiscussions`

`delete`()

Delete the saved model

*class* `dataikuapi.dss.savedmodel.``DSSSavedModelSettings`(*saved\_model*, *settings*)

A handle on the settings of a saved model

Do not create this class directly, instead use `dataikuapi.dss.DSSSavedModel.get\_settings()`

`get_raw`()

*property* `prediction_metrics_settings`

The settings of evaluation metrics for a prediction saved model

`save`()

Saves the settings of this saved model

### MLflow models[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#mlflow-models "Permalink to this headline")

*class* `dataikuapi.dss.savedmodel.``ExternalModelVersionHandler`(*saved\_model*, *version\_id*)

Handler to interact with an External model version (MLflow import of Proxy model

`get_settings`()

`set_core_metadata`(*target\_column\_name*, *class\_labels=None*, *get\_features\_from\_dataset=None*, *features\_list=None*, *output\_style='AUTO\_DETECT'*, *container\_exec\_config\_name='NONE'*)

Sets metadata for this MLFlow model version

In addition to target\_column\_name, one of get\_features\_from\_dataset or features\_list must be passed in order to be able to evaluate performance

* Parameters: * **target\_column\_name** (*str*) – name of the target column. Mandatory in order to be able to evaluate performance
* **class\_labels** (*list*) – List of strings, ordered class labels. Mandatory in order to be able to evaluate performance on classification models
* **get\_features\_from\_dataset** (*str*) – Name of a dataset to get feature names from
* **features\_list** (*list*) – List of {“name”: “feature\_name”, “type”: “feature\_type”}
* **container\_exec\_config\_name** (*str*) – Name of the containerized execution configuration to use for running the evaluation process.
If value is “INHERIT”, the container execution configuration of the project will be used.
If value is “NONE” (default), local execution will be used (no container)

`evaluate`(*dataset\_ref*, *container\_exec\_config\_name='INHERIT'*, *selection=None*, *use\_optimal\_threshold=True*)

Evaluates the performance of this model version on a particular dataset. After calling this, the “result screens” of the MLFlow model version will be available (confusion matrix, error distribution, performance metrics, …) and more information will be available when calling `DSSSavedModel.get\_version\_details()`

`set\_core\_metadata()` must be called before you can evaluate a dataset :param str dataset\_ref: Evaluation dataset to use (either a dataset name, “PROJECT.datasetName”, `DSSDataset` instance or `dataiku.Dataset` instance) :param str container\_exec\_config\_name: Name of the containerized execution configuration to use for running the evaluation process.

If value is “INHERIT”, the container execution configuration of the project will be used. If value is “NONE”, local execution will be used (no container)

* Parameters: * **selection** (*str*) – will default to HEAD\_SEQUENTIAL with a maxRecords of 10\_000.
* **use\_optimal\_threshold** (*boolean*) – Choose between optimized or actual threshold.
Optimized threshold has been computed according to the metric set on the saved
model setting “prediction\_metrics\_settings[‘thresholdOptimizationMetric’]”

*class* `dataikuapi.dss.savedmodel.``MLFlowVersionSettings`(*version\_handler*, *data*)

Handle for the settings of an imported MLFlow model version

*property* `raw`

`save`()

## Algorithm details[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#algorithm-details "Permalink to this headline")

This section documents which algorithms are available, and some of the settings for them.

These algorithm names can be used for `dataikuapi.dss.ml.DSSMLTaskSettings.get\_algorithm\_settings()` and `dataikuapi.dss.ml.DSSMLTaskSettings.set\_algorithm\_enabled()`

Note

This documentation does not cover all settings of all algorithms. To know which settings are available for an algorithm, use `mltask\_settings.get\_algorithm\_settings('ALGORITHM\_NAME')` and print the returned dictionary.

Generally speaking, most algorithm settings which are arrays means that this parameter can be grid-searched. All values will be tested as part of the hyperparameter optimization.

For more documentation of settings, please refer to the UI of the visual machine learning, which contains detailed documentation for all algorithm parameters

### LOGISTIC\_REGRESSION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#logistic-regression "Permalink to this headline")

* Type: Prediction (binary or multiclass)

* Available on backend: PY\_MEMORY

* Main parameters:

§ {

§ "multi\_class": SingleCategoryHyperparameterSettings, # accepted valued: ['multinomial', 'ovr']

§ "penalty": CategoricalHyperparameterSettings, # possible values: ["l1", "l2"]

§ "C": NumericalHyperparameterSettings, # scaling: "LOGARITHMIC"

§ "n\_jobs": 2

§ }

### RANDOM\_FOREST\_CLASSIFICATION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#random-forest-classification "Permalink to this headline")

* Type: Prediction (binary or multiclass)

* Available on backend: PY\_MEMORY

* Main parameters:

§ {

§ "n\_estimators": NumericalHyperparameterSettings, # scaling: "LINEAR"

§ "min\_samples\_leaf": NumericalHyperparameterSettings, # scaling: "LINEAR"

§ "max\_tree\_depth": NumericalHyperparameterSettings, # scaling: "LINEAR"

§ "max\_feature\_prop": NumericalHyperparameterSettings, # scaling: "LINEAR"

§ "max\_features": NumericalHyperparameterSettings, # scaling: "LINEAR"

§ "selection\_mode": SingleCategoryHyperparameterSettings, # accepted\_values=['auto', 'sqrt', 'log2', 'number', 'prop']

§ "n\_jobs": 4

§ }

### RANDOM\_FOREST\_REGRESSION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#random-forest-regression "Permalink to this headline")

* Type: Prediction (regression)

* Available on backend: PY\_MEMORY

* Main parameters: same as RANDOM\_FOREST\_CLASSIFICATION

### EXTRA\_TREES[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#extra-trees "Permalink to this headline")

* Type: Prediction (all kinds)

* Available on backend: PY\_MEMORY

### RIDGE\_REGRESSION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#ridge-regression "Permalink to this headline")

* Type: Prediction (regression)

* Available on backend: PY\_MEMORY

### LASSO\_REGRESSION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#lasso-regression "Permalink to this headline")

* Type: Prediction (regression)

* Available on backend: PY\_MEMORY

### LEASTSQUARE\_REGRESSION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#leastsquare-regression "Permalink to this headline")

* Type: Prediction (regression)

* Available on backend: PY\_MEMORY

### SVC\_CLASSIFICATION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#svc-classification "Permalink to this headline")

* Type: Prediction (binary or multiclass)

* Available on backend: PY\_MEMORY

### SVM\_REGRESSION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#svm-regression "Permalink to this headline")

* Type: Prediction (regression)

* Available on backend: PY\_MEMORY

### SGD\_CLASSIFICATION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#sgd-classification "Permalink to this headline")

* Type: Prediction (binary or multiclass)

* Available on backend: PY\_MEMORY

### SGD\_REGRESSION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#sgd-regression "Permalink to this headline")

* Type: Prediction (regression)

* Available on backend: PY\_MEMORY

### GBT\_CLASSIFICATION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#gbt-classification "Permalink to this headline")

* Type: Prediction (binary or multiclass)

* Available on backend: PY\_MEMORY

### GBT\_REGRESSION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#gbt-regression "Permalink to this headline")

* Type: Prediction (regression)

* Available on backend: PY\_MEMORY

### DECISION\_TREE\_CLASSIFICATION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#decision-tree-classification "Permalink to this headline")

* Type: Prediction (binary or multiclass)

* Available on backend: PY\_MEMORY

### DECISION\_TREE\_REGRESSION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#decision-tree-regression "Permalink to this headline")

* Type: Prediction (regression)

* Available on backend: PY\_MEMORY

### LIGHTGBM\_CLASSIFICATION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#lightgbm-classification "Permalink to this headline")

* Type: Prediction (binary or multiclass)

* Available on backend: PY\_MEMORY

### LIGHTGBM\_REGRESSION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#lightgbm-regression "Permalink to this headline")

* Type: Prediction (regression)

* Available on backend: PY\_MEMORY

### XGBOOST\_CLASSIFICATION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#xgboost-classification "Permalink to this headline")

* Type: Prediction (binary or multiclass)

* Available on backend: PY\_MEMORY

### XGBOOST\_REGRESSION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#xgboost-regression "Permalink to this headline")

* Type: Prediction (regression)

* Available on backend: PY\_MEMORY

### NEURAL\_NETWORK[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#neural-network "Permalink to this headline")

* Type: Prediction (all kinds)

* Available on backend: PY\_MEMORY

### DEEP\_NEURAL\_NETWORK\_REGRESSION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#deep-neural-network-regression "Permalink to this headline")

* Type: Prediction (regression)

* Available on backend: PY\_MEMORY

### DEEP\_NEURAL\_NETWORK\_CLASSIFICATION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#deep-neural-network-classification "Permalink to this headline")

* Type: Prediction (binary or multiclass)

* Available on backend: PY\_MEMORY

### KNN[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#knn "Permalink to this headline")

* Type: Prediction (all kinds)

* Available on backend: PY\_MEMORY

### LARS[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#lars "Permalink to this headline")

* Type: Prediction (all kinds)

* Available on backend: PY\_MEMORY

### MLLIB\_LOGISTIC\_REGRESSION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#mllib-logistic-regression "Permalink to this headline")

* Type: Prediction (binary or multiclass)

* Available on backend: MLLIB

### MLLIB\_DECISION\_TREE[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#mllib-decision-tree "Permalink to this headline")

* Type: Prediction (all kinds)

* Available on backend: MLLIB

### MLLIB\_RANDOM\_FOREST[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#mllib-random-forest "Permalink to this headline")

* Type: Prediction (all kinds)

* Available on backend: MLLIB

### MLLIB\_GBT[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#mllib-gbt "Permalink to this headline")

* Type: Prediction (all kinds)

* Available on backend: MLLIB

### MLLIB\_LINEAR\_REGRESSION[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#mllib-linear-regression "Permalink to this headline")

* Type: Prediction (regression)

* Available on backend: MLLIB

### MLLIB\_NAIVE\_BAYES[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#mllib-naive-bayes "Permalink to this headline")

* Type: Prediction (all kinds)

* Available on backend: MLLIB

### Other[¶](https://doc.dataiku.com/dss/latest/api/python/ml.html#other "Permalink to this headline")

* SCIKIT\_MODEL

* MLLIB\_CUSTOM

* SPARKLING\_DEEP\_LEARNING

* SPARKLING\_GBM

* SPARKLING\_RF

* SPARKLING\_GLM

* SPARKLING\_NB
