Gradient Boosted Trees
Gradient Boosted Trees is a generalization of boosting to arbitrary differentiable loss functions. GBT is an accurate and effective off-the-shelf procedure that can be used for both regression and classification problems. Gradient Tree Boosting models are used in a variety of areas including Web search ranking and ecology. The advantages of GBRT are:
Criterion used for information gain calculation
Used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity.
Must be ≥ number of categories in any categorical feature.
Specifies how often to checkpoint the cached node IDs. E.g. 10 means that the cache will get checkpointed every 10 iterations. This is only used if cacheNodeIds is true and if the checkpoint directory is set in the SparkContext.
If false, the algorithm will pass trees to executors to match instances with nodes. If true, the algorithm will cache node IDs for each instance. Caching can speed up training of deeper trees.
Minimum information gain for a split to be considered at a tree node.
Minimum number of instances each child must have after split. If a split causes the left or right child to have fewer than minInstancesPerNode, the split will be discarded as invalid.
Fraction of the training data used for learning each decision tree, in range (0, 1].
Allocated to histogram aggregation