ml_generalized_linear_regression: Spark ML -- Generalized Linear Regression

Description

Perform generalized linear regression on a Spark DataFrame.

Usage

ml_generalized_linear_regression(x, response, features, intercept = TRUE, family = gaussian(link = "identity"), max.iter = 100L, ...)

Arguments

An object coercable to a Spark DataFrame (typically, a tbl_spark).

response

The name of the response vector (as a length-one character vector), or a formula, giving a symbolic description of the model to be fitted. When response is a formula, it is used in preference to other parameters to set the response, features, and intercept parameters (if available). Currently, only simple linear combinations of existing parameters is supposed; e.g. response ~ feature1 + feature2 + .... The intercept term can be omitted by using - 1 in the model fit.

features

The name of features (terms) to use for the model fit.

intercept

Boolean; should the model be fit with an intercept term?

family

The family / link function to use; analogous to those normally passed in to calls to R's own glm.

max.iter

The maximum number of iterations to use.

...

Optional arguments; currently unused.

Details

In contrast to ml_linear_regression() and ml_logistic_regression(), these routines do not allow you to tweak the loss function (e.g. for elastic net regression); however, the model fits returned by this routine are generally richer in regards to information provided for assessing the quality of fit.

Description

Usage

Arguments

Details

See Also