Random Forest

A Random Forest is made of many decision trees. Each tree in the forest predicts a record, and each tree "votes" for the final answer of the forest.
The forest chooses the class having the most votes.

A decision tree is a simple algorithm which builds a decision tree. Each node of the decision tree includes a condition on one of the input features.

When "growing" (ie, training) the forest:

Random Forests generally provide good results, at the expense of "explainability" of the model.

The number of features to consider for splits at each tree node.
Criterion used for information gain calculation
Used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity.
Must be ≥ number of categories in any categorical feature.
Specifies how often to checkpoint the cached node IDs. E.g. 10 means that the cache will get checkpointed every 10 iterations. This is only used if cacheNodeIds is true and if the checkpoint directory is set in the SparkContext.
If false, the algorithm will pass trees to executors to match instances with nodes. If true, the algorithm will cache node IDs for each instance. Caching can speed up training of deeper trees.
Minimum information gain for a split to be considered at a tree node.
Minimum number of instances each child must have after split. If a split causes the left or right child to have fewer than minInstancesPerNode, the split will be discarded as invalid.
Fraction of the training data used for learning each decision tree, in range (0, 1].
Allocated to histogram aggregation