Spark ML -- Generalized Linear Regression

Arguments
Details
See also

Perform generalized linear regression on a Spark DataFrame.

ml_generalized_linear_regression(x, response, features, intercept = TRUE,
  family = gaussian(link = "identity"), weights.column = NULL,
  iter.max = 100L, ml.options = ml_options(), ...)

Arguments

x	An object coercable to a Spark DataFrame (typically, a `tbl_spark`).
response	The name of the response vector (as a length-one character vector), or a formula, giving a symbolic description of the model to be fitted. When `response` is a formula, it is used in preference to other parameters to set the `response`, `features`, and `intercept` parameters (if available). Currently, only simple linear combinations of existing parameters is supposed; e.g. `response ~ feature1 + feature2 + ...`. The intercept term can be omitted by using `- 1` in the model fit.
features	The name of features (terms) to use for the model fit.
intercept	Boolean; should the model be fit with an intercept term?
family	The family / link function to use; analogous to those normally passed in to calls to R's own `glm`.
weights.column	The name of the column to use as weights for the model fit.
iter.max	The maximum number of iterations to use.
ml.options	Optional arguments, used to affect the model generated. See `ml_options` for more details.
...	Optional arguments. The `data` argument can be used to specify the data to be used when `x` is a formula; this allows calls of the form `ml_linear_regression(y ~ x, data = tbl)`, and is especially useful in conjunction with `do`.

Details

In contrast to ml_linear_regression() and ml_logistic_regression(), these routines do not allow you to tweak the loss function (e.g. for elastic net regression); however, the model fits returned by this routine are generally richer in regards to information provided for assessing the quality of fit.

Spark ML -- Generalized Linear Regression

Arguments

Details

See also