# Experiment tracking with Catboost[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/experiment-tracking/catboost/index.html#experiment-tracking-with-catboost "Permalink to this heading")

In this tutorial you will train a model using the Catboost framework and use the experiment tracking abilities of Dataiku to log training runs (parameters, performance).

Pre-requisites

* Access to a Project with a Dataset that contains the UCI Bank Marketing data

* A Code Environment containing the `mlflow` and `catboost` packages

The following code snippet provides a reusable example to train a simple gradient boosting model, with these main steps:

**(1)**: Select the features and target variables.

**(2)**: Define the hyperparameters to run the training on. Set the number of boosting rounds to 100, and to check whether overfitting is occuring during cross-validation, set `early\_stopping\_rounds` to 5. To cap boosting rounds, limit the training to the iteration that has the best score by setting `use\_best\_model` to `True`.

**(3)**: Perform the experiment run, log the hyperparameters, performance metrics (here we use the AUC) and the trained model.

§ import dataiku

§ from catboost import CatBoostClassifier, Pool, cv

§ # !! - Replace these values by your own - !!

§ USER\_PROJECT\_KEY = ""

§ USER\_XPTRACKING\_FOLDER\_ID = ""

§ USER\_EXPERIMENT\_NAME = ""

§ USER\_TRAINING\_DATASET = ""

§ USER\_MLFLOW\_CODE\_ENV\_NAME = ""

§ client = dataiku.api\_client()

§ project = client.get\_project(USER\_PROJECT\_KEY)

§ # (1)

§ ds = dataiku.Dataset(USER\_TRAINING\_DATASET)

§ df = ds.get\_dataframe()

§ cat\_features= ["job", "marital", "education", "default",

§ "housing","loan", "month", "contact", "poutcome"]

§ target ="y"

§ X = df.drop(target, axis=1)

§ y = df[target]

§ # (2)

§ params = {

§ 'iterations': 100,

§ 'learning\_rate': 0.1,

§ 'depth': 10,

§ 'cat\_features': cat\_features,

§ 'loss\_function': 'Logloss',

§ 'eval\_metric': 'AUC',

§ 'early\_stopping\_rounds': 5,

§ 'use\_best\_model': True,

§ 'random\_seed': 42,

§ }

§ # (3)

§ mf = project.get\_managed\_folder(USER\_XPTRACKING\_FOLDER\_ID)

§ mlflow\_extension = project.get\_mlflow\_extension()

§ with project.setup\_mlflow(mf) as mlflow:

§ mlflow.set\_experiment(experiment\_name=USER\_EXPERIMENT\_NAME)

§ with mlflow.start\_run() as run:

§ run\_id = run.info.run\_id

§ cv\_dataset = Pool(

§ data=X, label=y, cat\_features= cat\_features)

§ scores = cv(cv\_dataset,

§ params,

§ fold\_count=5,

§ seed=42,

§ plot= False)

§ for x in range(len(scores.index)):

§ mlflow.log\_metric(key='mean\_AUC', value=scores['test-AUC-mean'][x], step=x)

§ mlflow.log\_metric(key='sd\_AUC', value=scores['test-AUC-std'][x], step=x)

§ mlflow.log\_params(params=params)

§ if params['early\_stopping\_rounds']:

§ mlflow.log\_metric(key='best\_iteration', value=len(scores.index))

§ if params['use\_best\_model']:

§ params['iterations'] = len(scores.index)

§ params['use\_best\_model'] = False

§ model = CatBoostClassifier(\*\*params)

§ cb\_model = model.fit(X,y)

§ mlflow.catboost.log\_model(cb\_model, artifact\_path="model")

§ mlflow\_extension.set\_run\_inference\_info(run\_id=run\_id,

§ prediction\_type="BINARY\_CLASSIFICATION",

§ classes=['no', 'yes'],

§ code\_env\_name=USER\_MLFLOW\_CODE\_ENV\_NAME,

§ target=target)

After these steps you should have your Experiment Run’s data available both in the Dataiku UI and programmatically via the `DSSMLflowExtension` object of the Python API client.
