Apply an R Function in Spark

Applies an R function to a Spark object (typically, a Spark DataFrame).

spark_apply(x, f, columns = colnames(x), memory = TRUE, group_by = NULL,
  packages = TRUE, ...)

Arguments

x	An object (usually a `spark_tbl`) coercable to a Spark DataFrame.
f	A function that transforms a data frame partition into a data frame. The function `f` has signature `f(df, group1, group2, ...)` where `df` is a data frame with the data to be processed and `group1` to `groupN` contain the values of the `group_by` values. When `group_by` is not specified, `f` takes only one argument.
columns	A vector of column names or a named vector of column types for the transformed object. Defaults to the names from the original object and adds indexed column names when not enough columns are specified.
memory	Boolean; should the table be cached into memory?
group_by	Column name used to group by data frame partitions.
packages	Boolean to distribute `.libPaths()` packages to each node, or a list of packages to distribute. For offline clusters where `available.packages()` is not available, manually download the packages database from https://cran.r-project.org/web/packages/packages.rds and set `Sys.setenv(sparklyr.apply.packagesdb = "<pathl-to-rds>")`. Otherwise, all packages will be used by default.
...	Optional arguments; currently unused.