ml_logistic_regression: Spark ML -- Logistic Regression

Description

Perform logistic regression on a Spark DataFrame.

Usage

ml_logistic_regression(x, response, features, intercept = TRUE, alpha = 0, lambda = 0, max.iter = 100L, ...)

Arguments

An object coercable to a Spark DataFrame (typically, a tbl_spark).

response

The name of the response vector (as a length-one character vector), or a formula, giving a symbolic description of the model to be fitted. When response is a formula, it is used in preference to other parameters to set the response, features, and intercept parameters (if available). Currently, only simple linear combinations of existing parameters is supposed; e.g. response ~ feature1 + feature2 + .... The intercept term can be omitted by using - 1 in the model fit.

features

The name of features (terms) to use for the model fit.

intercept

Boolean; should the model be fit with an intercept term?

alpha, lambda

Parameters controlling loss function penalization (for e.g. lasso, elastic net, and ridge regression). See Details for more information.

max.iter

The maximum number of iterations to use.

...

Optional arguments; currently unused.

Details

Spark implements for both $L1$ and $L2$ regularization in linear regression models. See the preamble in the Spark Classification and Regression documentation for more details on how the loss function is parameterized.

In particular, with alpha set to 1, the parameterization is equivalent to a lasso model; if alpha is set to 0, the parameterization is equivalent to a ridge regression model.

Description

Usage

Arguments

Details

See Also