Spark ML -- K-Means Clustering
Perform k-means clustering on a Spark DataFrame.
ml_kmeans(x, centers, iter.max = 100, features = tbl_vars(x),
compute.cost = TRUE, tolerance = 1e-04, ml.options = ml_options(), ...)Arguments
| x | An object coercable to a Spark DataFrame (typically, a
|
| centers | The number of cluster centers to compute. |
| iter.max | The maximum number of iterations to use. |
| features | The name of features (terms) to use for the model fit. |
| compute.cost | Whether to compute cost for |
| tolerance | Param for the convergence tolerance for iterative algorithms. |
| ml.options | Optional arguments, used to affect the model generated. See
|
| ... | Optional arguments. The |
Value
ml_model object of class kmeans with overloaded print, fitted and predict functions.
References
Bahmani et al., Scalable K-Means++, VLDB 2012
See also
For information on how Spark k-means clustering is implemented, please see http://spark.apache.org/docs/latest/mllib-clustering#k-means.
Other Spark ML routines: ml_als_factorization,
ml_decision_tree,
ml_generalized_linear_regression,
ml_gradient_boosted_trees,
ml_lda, ml_linear_regression,
ml_logistic_regression,
ml_multilayer_perceptron,
ml_naive_bayes,
ml_one_vs_rest, ml_pca,
ml_random_forest,
ml_survival_regression