During preprocessing of features, statistics must be computed for each feature. Show more…

For example, we need to compute the average of a numerical feature if we want to rescale it.

To prevent from using too much memory, you can decide to compute these statistics only on a subsample of the train set.

If you use a subsample, computation will be faster and will use less memory but the results will be less representative of the data.

Show less…

Proportion of the train set that goes to the sample used to compute statistics.
Using a fixed random seed allows for reproducible result