Stochastic Gradient Descent

SGD is a family of algorithms that reuse concepts from Support Vector Machines and Logistic Regression.

SGD uses an optimized method to minimize the cost (or loss) function, making it particularly suitable for large datasets (or datasets with large number of features).

Selecting 'logit' loss will make the SGD behave like a Logistic Regression.
Enabling 'modified huber' loss will make the SGD behave quite like a Support Vector Machine
Maximum number of iterations on the train data
Tolerance for stopping criterion, i.e. variation on loss below which the algorithm stops training. Should be a positive number.
L1 and L2 regularizations are similar to the ones for Logistic Regression.
ElasticNet regularization combines both a L1 and L2 regularization.
Used for elasticNet regularization, this ratio controls the proportion of L1 in the mix. (ie: 1 corresponds to L1-only, 0 corresponds to L2-only). Defaults to 0.15 (15% L1, 85% L2).
Number of cores used for parallel training. Using more cores leads to faster training but at the expense of more memory consumption, especially for large training datasets.