sparklyr (version 0.2.32)

ml_generalized_linear_regression: Spark ML -- Generalized Linear Regression

Description

Perform generalized linear regression on a Spark DataFrame.

Usage

ml_generalized_linear_regression(x, response, features, intercept = TRUE, family = gaussian(link = "identity"), max.iter = 100L, ...)

Arguments

x
An object coercable to a Spark DataFrame (typically, a tbl_spark).
response
The name of the response vector (as a length-one character vector), or a formula, giving a symbolic description of the model to be fitted. When response is a formula, it is used in preference to other parameters to set the response, features, and intercept parameters (if available). Currently, only simple linear combinations of existing parameters is supposed; e.g. response ~ feature1 + feature2 + .... The intercept term can be omitted by using - 1 in the model fit.
features
The name of features (terms) to use for the model fit.
intercept
Boolean; should the model be fit with an intercept term?
family
The family / link function to use; analogous to those normally passed in to calls to R's own glm.
max.iter
The maximum number of iterations to use.
...
Optional arguments; currently unused.

Details

In contrast to ml_linear_regression() and ml_logistic_regression(), these routines do not allow you to tweak the loss function (e.g. for elastic net regression); however, the model fits returned by this routine are generally richer in regards to information provided for assessing the quality of fit.

See Also

Other Spark ML routines: ml_als_factorization, ml_decision_tree, ml_gradient_boosted_trees, ml_kmeans, ml_lda, ml_linear_regression, ml_logistic_regression, ml_multilayer_perceptron, ml_naive_bayes, ml_one_vs_rest, ml_pca, ml_random_forest, ml_survival_regression