h2o (version 3.2.0.3)

h2o.startGLMJob: Start an H2O Generalized Linear Model Job

Description

Creates a background H2O GLM job.

Usage

h2o.startGLMJob(x, y, training_frame, model_id, validation_frame,
  max_iterations = 50, beta_epsilon = 0, solver = c("IRLSM", "L_BFGS"),
  standardize = TRUE, family = c("gaussian", "binomial", "poisson", "gamma",
  "tweedie"), link = c("family_default", "identity", "logit", "log",
  "inverse", "tweedie"), tweedie_variance_power = NaN,
  tweedie_link_power = NaN, alpha = 0.5, prior = 0, lambda = 1e-05,
  lambda_search = FALSE, nlambdas = -1, lambda_min_ratio = 1,
  nfolds = 0, beta_constraints = NULL, ...)

Arguments

x
A vector containing the names or indices of the predictor variables to use in building the GLM model.
y
A character string or index that represent the response variable in the model.
training_frame
An H2OFrame object containing the variables in the model.
model_id
(Optional) The unique id assigned to the resulting model. If none is given, an id will automatically be generated.
validation_frame
An H2OFrame object containing the variables in the model.
max_iterations
A non-negative integer specifying the maximum number of iterations.
beta_epsilon
A non-negative number specifying the magnitude of the maximum difference between the coefficient estimates from successive iterations. Defines the convergence criterion for h2o.glm.
solver
A character string specifying the solver used: IRLSM (supports more features), L_BFGS (scales better for datasets with many columns)
standardize
A logical value indicating whether the numeric predictors should be standardized to have a mean of 0 and a variance of 1 prior to training the models.
family
A character string specifying the distribution of the model: gaussian, binomial, poisson, gamma, tweedie.
link
A character string specifying the link function. The default is the canonical link for the family. The supported links for each of the family specifications are: "gaussian": "identity", "log"
tweedie_variance_power
A numeric specifying the power for the variance function when family = "tweedie".
tweedie_link_power
A numeric specifying the power for the link function when family = "tweedie".
alpha
A numeric in [0, 1] specifying the elastic-net mixing parameter. The elastic-net penalty is defined to be: $$P(\alpha,\beta) = (1-\alpha)/2||\beta||_2^2 + \alpha||\beta||_1 = \sum_j [(1-\alpha)/2 \beta_j^2 + \alpha|\beta_j|]$$, making alpha = 1
prior
(Optional) A numeric specifying the prior probability of class 1 in the response when family = "binomial". The default prior is the observational frequency of class 1.
lambda
A non-negative shrinkage parameter for the elastic-net, which multiplies $P(\alpha,\beta)$ in the objective function. When lambda = 0, no elastic-net penalty is applied and ordinary generalized linear models are fit.
lambda_search
A logical value indicating whether to conduct a search over the space of lambda values starting from the lambda max, given lambda is interpreted as lambda min.
nlambdas
The number of lambda values to use when lambda_search = TRUE.
lambda_min_ratio
Smallest value for lambda as a fraction of lambda.max. By default if the number of observations is greater than the the number of variables then lambda_min_ratio = 0.0001; if the number of observations is less than the number of variables the
nfolds
(Optional) Number of folds for cross-validation. If nfolds >= 2, then validation must remain empty.
beta_constraints
A data.frame or H2OParsedData object with the columns ["names", "lower_bounds", "upper_bounds", "beta_given"], where each row corresponds to a predictor in the GLM. "names" contains the predictor names, "lower_bounds" and "upper_bounds" are the lower and
...
(Currently Unimplemented) coefficients.

Value

  • Returns a H2OModelFuture class object.