Stochastic Gradient Descent

SGD is a family of algorithms that reuse concepts from Support Vector Machines and Linear Regressions.

SGD uses an optimized method to minimize the cost (or loss) function, making it particularly suitable for large datasets (or datasets with large number of features).

Selecting 'squared' loss will make the SGD behave like a Linear (OLS, LASSO and Ridge) Regression.
Enabling 'Huber' loss will make the SGD more robust to outliers.
Maximum number of iterations on the train data
Tolerance for stopping criterion, i.e. variation on loss below which the algorithm stops training. Should be a positive number.
L1 regularization is used in Lasso regression and L2 regularization is used in ridge Regressions.
ElasticNet regularization combines both a L1 and L2 regularization.
Used for elasticNet regularization, this ratio controls the proportion of L1 in the mix. (ie: 1 corresponds to L1-only, 0 corresponds to L2-only). Defaults to 0.15 (15% L1, 85% L2).