Training

Model

The pretrained model from which the fine-tuning will starts.
The number of backbone layers from the pretrained model to fine-tune.
Note: the last fully-connected layer is trained from scratch.

Optimization

The optimizer to use for parameters optimization.
The method to adjust the learning rate based on the number of epochs.
The initial learning rate for gradient descent. Smaller values lead to better convergence, but take more time.
Weight decay used for L2 penalty

Fine-tuning

The number of samples to include in each mini-batch.
The number of epochs for learning. Higher values lead to better convergence, but take more time.
Ratio for splitting train & validation sets to compute metrics for best epoch & early stopping.
Stop the training when no improvement is observed on the loss for X epochs
Minimal value for a score change to be considered significant
The number of epochs waited before stopping the training if no significant improvement in LR is found