The metrics used to rank models obtained by different algorithms are computed on the test set. The final model is trained on the train set.
The metrics used to rank models obtained by different algorithms are computed on each of the test folds. The final model is trained on the sampled dataset.
The final model is trained on the train set. Both the metrics used to rank models obtained by different algorithms, and the calibration function are computed on the test set. (settings)
The train set is randomly split in two: one split is dedicated to train the final model and
the other split is dedicated to train the calibration function.
(settings)
The metrics used to rank models obtained by different algorithms are computed on the test set.
The sampled dataset is randomly split in two: one part is dedicated to train the final model and the other part is dedicated to train the calibration function. (settings)
Probability calibration helps adjust the predicted probabilities to the actual class frequencies.
It should only be used if the problem involves actual probabilities of events, not just the ordering of these probabilities (ranking).
For instance, upon predicting some probability of the positive class occurring, non-calibrated models can underestimate or overestimate the actual frequency of the positive class occurring, which can lead to suboptimal decisions.
Calibrated models can be especially useful when the predicted probabilities are used to compute expectations of another quantity.
Note that isotonic regression is more prone to overfitting and metrics alteration than Platt scaling.
The metrics used to rank hyperparameter points are computed by simple split validation.
The train/validation split strategy simply holds out part of the train set to validate it on the validation set.
The dataset is split according to the chosen custom cross-validation strategy.