Isolation Forest (Anomaly Detection)
Isolation forest is an anomaly detection algorithm. It isolates observations by creating a Random Forest of trees, each splitting samples in different partitions. Anomalies tend to have much shorter paths from the root of the tree. Thus, the mean distance from the root provides a good measure of non-normality.
Number of trees in the forest.
Expected proportion of anomalies in the data.
Let the threshold be determined by the algorithm.
Proportion of the data to use to build each tree.
Use bootstrapping to sample from the data.
Proportion of features to test for each split.
Maximum number of anomalies to display in the model report. Too high a number may cause memory and UI problems.
Used to generate reproducible results. 0 or no value means that no known seed is used (results will not be fully reproducible).
Number of cores used for parallel training. Using more cores leads to faster training but at the expense of more memory consumption, especially for large training datasets.
Allow DSS to use sparse matrices to train the model
This may help reduce RAM and CPU usage