Decision tree is a simple algorithm which builds a decision tree. Each node of the decision tree includes a condition on one of the input features, which divides the feature space into two partitions.
The leaves of the tree are labeled with predictions. The deeper the tree, the more complicated the decision rule, which gives finer predictions, but quickly leads to overfitting.
Used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity.
Must be ≥ number of categories in any categorical feature.
Specifies how often to checkpoint the cached node IDs. E.g. 10 means that the cache will get checkpointed every 10 iterations. This is only used if cacheNodeIds is true and if the checkpoint directory is set in the SparkContext.
If false, the algorithm will pass trees to executors to match instances with nodes. If true, the algorithm will cache node IDs for each instance. Caching can speed up training of deeper trees.
Minimum information gain for a split to be considered at a tree node.
Minimum number of instances each child must have after split. If a split causes the left or right child to have fewer than minInstancesPerNode, the split will be discarded as invalid.