The KMeans algorithm clusters data by separating samples into several clusters, characterized by their centers ("centroids"). The algorithm tries to group the data as close as possible to their centroid, by minimizing a criterion called 'inertia'.
Number of times the algorithm will be run with different initial centroids. The greatest performing centroid will be used for the returned model.
Multiple runs are highly recommended if using sparse data to prevent poor performance.
Used to generate reproducible results. 0 or no value means that no known seed is used (results will not be fully reproducible)
Number of cores used for parallel training. Using more cores leads to faster training but at the expense of more memory consumption, especially for large training datasets.
Allow DSS to use sparse matrices to train the model
This may help reduce RAM and CPU usage
Sparse matrices is enabled but only {{mlTaskDesign.modeling.kmeans_clustering.n_init}} centroid(s) will be tested. It is strongly recommended to test multiple centroids (at least 5) when working with sparse matrices.