Sampling & splitting Edit
Calibration
Evaluation {{isCalibrationEnabled() ? "& Calibration" : ""}}
Evaluation fold 1
1
1
Evaluation fold 2
2
2
Evaluation fold {{ mlTaskDesign.splitParams.nFolds - 1 }}
{{ mlTaskDesign.splitParams.nFolds - 1 }}
{{ mlTaskDesign.splitParams.nFolds - 1 }}
Evaluation fold {{ mlTaskDesign.splitParams.nFolds }}
{{ mlTaskDesign.splitParams.nFolds }}
{{ mlTaskDesign.splitParams.nFolds }}
Train set

The metrics used to rank models obtained by different algorithms are computed on the test set. The final model is trained on the train set.

The metrics used to rank models obtained by different algorithms are computed on each of the test folds. The final model is trained on the sampled dataset.

The final model is trained on the train set. Both the metrics used to rank models obtained by different algorithms, and the calibration function are computed on the test set. (settings)

The train set is randomly split in two: one split is dedicated to train the final model and the other split is dedicated to train the calibration function. (settings)
The metrics used to rank models obtained by different algorithms are computed on the test set.

The sampled dataset is randomly split in two: one part is dedicated to train the final model and the other part is dedicated to train the calibration function. (settings)


Usage tips

Probability calibration helps adjust the predicted probabilities to the actual class frequencies.
It should only be used if the problem involves actual probabilities of events, not just the ordering of these probabilities (ranking).
For instance, upon predicting some probability of the positive class occurring, non-calibrated models can underestimate or overestimate the actual frequency of the positive class occurring, which can lead to suboptimal decisions.
Calibrated models can be especially useful when the predicted probabilities are used to compute expectations of another quantity.
Note that isotonic regression is more prone to overfitting and metrics alteration than Platt scaling.


Hyperparameters Edit
No search will be performed according to current settings
{{ getCrossValidationLabel() }} & {{ mlTaskDesign.modeling.gridSearchParams.splitRatio }} split ratio
Hyperparameters

The metrics used to rank hyperparameter points are computed by simple split validation.
The train/validation split strategy simply holds out part of the train set to validate it on the validation set.

{{ getCrossValidationLabel() }}

Fold {{ $index + 1 }}
Fold 1
1
Fold 2
2
Fold {{ mlTaskDesign.modeling.gridSearchParams.nFolds - 1 }}
{{ mlTaskDesign.modeling.gridSearchParams.nFolds - 1 }}
Fold {{ mlTaskDesign.modeling.gridSearchParams.nFolds }}
{{ mlTaskDesign.modeling.gridSearchParams.nFolds }}
The metrics used to rank hyperparameter points are computed by cross-validation.
In K-fold cross-validation the dataset is partitioned into k equally sized subsets. Then, k-1 subsets are used as folded train sets while the remaining subset is retained to validate the model.
This process is then repeated k times, once for each fold defined by the subset used as validation set.

{{ getCrossValidationLabel() }}

Fold {{ $index + 1 }}
Fold 1
1
Fold 2
2
Fold {{ mlTaskDesign.modeling.gridSearchParams.nFolds - 1 }}
{{ mlTaskDesign.modeling.gridSearchParams.nFolds - 1 }}
Fold {{ mlTaskDesign.modeling.gridSearchParams.nFolds }}
{{ mlTaskDesign.modeling.gridSearchParams.nFolds }}
The metrics used to rank hyperparameter points are computed by cross-validation.
In time-based K-fold the dataset is partitioned into k equally sized subsets sorted along the time variable. Then, one of the subset is chosen as the validation subset and the subsets before (or after) it are used as folded train sets. This process is repeated k-1 times, once for each fold defined by the subset used as validation set except the first one.

{{ getCrossValidationLabel() }}

The dataset is split according to the chosen custom cross-validation strategy.