Usage
h2o.glm(x, y, data, key = "", offset = NULL, family, link,
tweedie.p = ifelse(family == "tweedie", 1.5, NA_real_),
prior = NULL, nfolds = 0, alpha = 0.5, lambda = 1e-5,
lambda_search = FALSE, nlambda = -1, lambda.min.ratio = -1,
max_predictors = -1, return_all_lambda = FALSE,
strong_rules = TRUE, standardize = TRUE, intercept = TRUE,
non_negative = FALSE, use_all_factor_levels = FALSE,
variable_importances = FALSE, epsilon = 1e-4, iter.max = 100,
higher_accuracy = FALSE, beta_constraints = NULL,
disable_line_search = FALSE)
Arguments
x
A character vector containing the column names of the predictors in
the model.
y
A character string representing the response variable in the model.
data
An H2OParsedData
object containing the
variables in the model.
key
An optional unique hex key assigned to the resulting model.
If none is given, a key will automatically be generated.
offset
An optional character string representing the offset term in
the model.
family
A character string specifying the error distribution of the
model; one of "gaussian"
, "binomial"
, "poisson"
,
"gamma"
, and "tweedie"
.
link
A character string specifying the link function. The default is
the canonical link for the family
. The supported links for each of
the family
specifications are:
"gaussian"
: "identity"
,
tweedie.p
A numeric specifying the power for the variance function
when family = "tweedie"
.
prior
An optional numeric specifying the prior probability of class 1
in the response when family = "binomial"
. The default prior is the
observational frequency of class 1.
nfolds
A non-negative integer specifying the number of folds for
cross-validation and nfolds = 0
indicates no cross-validation.
alpha
A numeric in [0, 1] specifying the elastic-net mixing parameter.
The elastic-net penalty is defined to be
$$P(\alpha,\beta) = (1-\alpha)/2||\beta||_2^2 + \alpha||\beta||_1 = \sum_j [(1-\alpha)/2 \beta_j^2 + \alpha|\beta_j|]$$,
making
lambda
A non-negative shrinkage parameter for the elastic-net, which
multiplies $P(\alpha,\beta)$ in the objective. When lambda = 0
,
then no elastic-net penalty is applied and ordinary generalized linear
models are fit.
lambda_search
A logical value indicating whether to conduct a search
over the space of lambda values starting from the lambda
argument
to lambda.min.ratio
times the smallest lambda that produces zeros
for all the coefficient est
nlambda
The number of lambda values to use when
lambda_search = TRUE
.
lambda.min.ratio
A non-negative number that specifies the minimum
value for lambda as a fraction of smallest lambda that yields the zero
vector for the coefficient estimates.
max_predictors
When lambda_search = TRUE
, a non-negative
integer specifying an early stopping rule for the maximum number of
predictors in the model.
return_all_lambda
A logical value indicating whether to return every
model built during the lambda search. If return_all_lambda = FALSE
,
then only the model corresponding to the optimal lambda will be returned.
strong_rules
A logical value indicating whether to use strong rules to
remove predictors with gradients near zero at the starting solution
prior to model training.
standardize
A logical value indicating whether the numeric predictors
should be standardized to have a mean of 0 and a variance of 1 prior to
training the models.
intercept
A logical value indicating whether to include the intercept
term in the models. This will only have a practical effect in the presence
of all numeric predictors.
non_negative
A logical value indicating whether the coefficient
estimates will be constrained to be non-negative.
use_all_factor_levels
A logical value indicating whether dummy
variables should be used for all factor levels of the categorical predictors.
When TRUE
, results in an over parameterized models.
variable_importances
A logical value indicating whether the variable
importances should be computed.
epsilon
A non-negative number specifying the magnitude of the maximum
difference between the coefficient estimates from successive iterations.
Defines the convergence criterion for h2o.glm
.
iter.max
A non-negative integer specifying the maximum number of
iterations.
higher_accuracy
A logical value indicating whether to use line search
to produce more accurate estimates.
beta_constraints
A data.frame or H2OParsedData object with the columns ["names", "lower_bounds", "upper_bounds", "beta_given"],
where each row corresponds to a predictor in the GLM. "names" contains the predictor names, "lower"/"upper_bounds",
are the lower and
disable_line_search
A logical value indicating whether line search should be disabled.