Spark ML -- K-Means Clustering

Perform k-means clustering on a Spark DataFrame.

ml_kmeans(x, centers, iter.max = 100, features = tbl_vars(x),
  compute.cost = TRUE, tolerance = 1e-04, ml.options = ml_options(), ...)

Arguments

x	An object coercable to a Spark DataFrame (typically, a `tbl_spark`).
centers	The number of cluster centers to compute.
iter.max	The maximum number of iterations to use.
features	The name of features (terms) to use for the model fit.
compute.cost	Whether to compute cost for `k-means` model using Spark's computeCost.
tolerance	Param for the convergence tolerance for iterative algorithms.
ml.options	Optional arguments, used to affect the model generated. See `ml_options` for more details.
...	Optional arguments. The `data` argument can be used to specify the data to be used when `x` is a formula; this allows calls of the form `ml_linear_regression(y ~ x, data = tbl)`, and is especially useful in conjunction with `do`.

ml_model object of class kmeans with overloaded print, fitted and predict functions.

Bahmani et al., Scalable K-Means++, VLDB 2012