Perform generalized linear regression on a Spark DataFrame.
ml_generalized_linear_regression(x, response, features, intercept = TRUE,
family = gaussian(link = "identity"), iter.max = 100L,
ml.options = ml_options(), ...)An object coercable to a Spark DataFrame (typically, a
tbl_spark).
The name of the response vector (as a length-one character
vector), or a formula, giving a symbolic description of the model to be
fitted. When response is a formula, it is used in preference to other
parameters to set the response, features, and intercept
parameters (if available). Currently, only simple linear combinations of
existing parameters is supposed; e.g. response ~ feature1 + feature2 + ....
The intercept term can be omitted by using - 1 in the model fit.
The name of features (terms) to use for the model fit.
Boolean; should the model be fit with an intercept term?
The family / link function to use; analogous to those normally
passed in to calls to R's own glm.
The maximum number of iterations to use.
Optional arguments, used to affect the model generated. See
ml_options for more details.
Optional arguments. The data argument can be used to
specify the data to be used when x is a formula; this allows calls
of the form ml_linear_regression(y ~ x, data = tbl), and is
especially useful in conjunction with do.
In contrast to ml_linear_regression() and
ml_logistic_regression(), these routines do not allow you to
tweak the loss function (e.g. for elastic net regression); however, the model
fits returned by this routine are generally richer in regards to information
provided for assessing the quality of fit.
Other Spark ML routines: ml_als_factorization,
ml_decision_tree,
ml_gradient_boosted_trees,
ml_kmeans, ml_lda,
ml_linear_regression,
ml_logistic_regression,
ml_multilayer_perceptron,
ml_naive_bayes,
ml_one_vs_rest, ml_pca,
ml_random_forest,
ml_survival_regression