```
h2o.glm(x, y, training_frame, model_id, validation_frame = NULL,
ignore_const_cols = TRUE, max_iterations = 50, beta_epsilon = 0,
solver = c("IRLSM", "L_BFGS"), standardize = TRUE,
family = c("gaussian", "binomial", "poisson", "gamma", "tweedie",
"multinomial"), link = c("family_default", "identity", "logit", "log",
"inverse", "tweedie"), tweedie_variance_power = NaN,
tweedie_link_power = NaN, alpha = 0.5, prior = NULL, lambda = 1e-05,
lambda_search = FALSE, nlambdas = -1, lambda_min_ratio = -1,
nfolds = 0, fold_column = NULL, fold_assignment = c("AUTO", "Random",
"Modulo"), keep_cross_validation_predictions = FALSE,
beta_constraints = NULL, offset_column = NULL, weights_column = NULL,
intercept = TRUE, max_active_predictors = -1, objective_epsilon = -1,
gradient_epsilon = -1, non_negative = FALSE, compute_p_values = FALSE,
remove_collinear_columns = FALSE, max_runtime_secs = 0,
missing_values_handling = c("MeanImputation", "Skip"))
```

x

A vector containing the names or indices of the predictor variables to use in building the GLM model.

y

A character string or index that represent the response variable in the model.

training_frame

An H2OFrame object containing the variables in the model.

model_id

(Optional) The unique id assigned to the resulting model. If none is given, an id will automatically be generated.

validation_frame

An H2OFrame object containing the variables in the model. Defaults to NULL.

ignore_const_cols

A logical value indicating whether or not to ignore all the constant columns in the training frame.

max_iterations

A non-negative integer specifying the maximum number of iterations.

beta_epsilon

A non-negative number specifying the magnitude of the maximum difference between the coefficient estimates from successive iterations.
Defines the convergence criterion for

`h2o.glm`

.solver

A character string specifying the solver used: IRLSM (supports more features), L_BFGS (scales better for datasets with many columns)

standardize

A logical value indicating whether the numeric predictors should be standardized to have a mean of 0 and a variance of 1 prior to
training the models.

family

A character string specifying the distribution of the model: gaussian, binomial, poisson, gamma, tweedie.

link

A character string specifying the link function. The default is the canonical link for the

`family`

. The supported links for each of
the `family`

specifications are:
`"gaussian"`

: `"identity"`

, `"log"`

tweedie_variance_power

A numeric specifying the power for the variance function when

`family = "tweedie"`

.tweedie_link_power

A numeric specifying the power for the link function when

`family = "tweedie"`

.alpha

A numeric in [0, 1] specifying the elastic-net mixing parameter.
The elastic-net penalty is defined to be:
$$P(\alpha,\beta) = (1-\alpha)/2||\beta||_2^2 + \alpha||\beta||_1 = \sum_j [(1-\alpha)/2 \beta_j^2 + \alpha|\beta_j|]$$
making

`alpha = 1`

prior

(Optional) A numeric specifying the prior probability of class 1 in the response when

`family = "binomial"`

.
The default prior is the observational frequency of class 1. Must be from (0,1) exclusive range or NULL (no prior).lambda

A non-negative shrinkage parameter for the elastic-net, which multiplies $P(\alpha,\beta)$ in the objective function.
When

`lambda = 0`

, no elastic-net penalty is applied and ordinary generalized linear models are fit.lambda_search

A logical value indicating whether to conduct a search over the space of lambda values starting from the lambda max, given

`lambda`

is interpreted as lambda min.nlambdas

The number of lambda values to use when

`lambda_search = TRUE`

.lambda_min_ratio

Smallest value for lambda as a fraction of lambda.max. By default if the number of observations is greater than the
the number of variables then

`lambda_min_ratio`

= 0.0001; if the number of observations is less than the number
of variables thenfolds

(Optional) Number of folds for cross-validation. If

`nfolds >= 2`

, then `validation`

must remain empty.fold_column

(Optional) Column with cross-validation fold index assignment per observation.

fold_assignment

Cross-validation fold assignment scheme, if fold_column is not specified
Must be "AUTO", "Random" or "Modulo".

keep_cross_validation_predictions

Whether to keep the predictions of the cross-validation models.

beta_constraints

A data.frame or H2OParsedData object with the columns ["names",
"lower_bounds", "upper_bounds", "beta_given", "rho"], where each row corresponds to a predictor
in the GLM. "names" contains the predictor names, "lower_bounds" and "upper_bounds" are the low

offset_column

Specify the offset column.

weights_column

Specify the weights column.

intercept

Logical, include constant term (intercept) in the model.

max_active_predictors

(Optional) Convergence criteria for number of predictors when using L1 penalty.

objective_epsilon

Convergence criteria. Converge if relative change in objective function is below this threshold.

gradient_epsilon

Convergence criteria. Converge if gradient l-infinity norm is below this threshold.

non_negative

Logical, allow only positive coefficients.

compute_p_values

(Optional) Logical, compute p-values, only allowed with IRLSM solver and no regularization. May fail if there are collinear predictors.

remove_collinear_columns

(Optional) Logical, valid only with no regularization. If set, co-linear columns will be automatically ignored (coefficient will be 0).

max_runtime_secs

Maximum allowed runtime in seconds for model training. Use 0 to disable.

missing_values_handling

(Optional) Controls handling of missing values. Can be either "MeanImputation" or "Skip". MeanImputation replaces missing values with mean for numeric and most frequent level for categorical, Skip ignores observations with any missing value. Applied both

...

(Currently Unimplemented)
coefficients.

- A subclass of

is returned. The specific subclass depends on the machine learning task at hand (if it's binomial classification, then anH2OModel

is returned, if it's regression then aH2OBinomialModel

is returned). The default print-out of the models is shown, but further GLM-specifc information can be queried out of the object. To access these various items, please refer to the seealso section below.H2ORegressionModel Upon completion of the GLM, the resulting object has coefficients, normalized coefficients, residual/null deviance, aic, and a host of model metrics including MSE, AUC (for logistic regression), degrees of freedom, and confusion matrices. Please refer to the more in-depth GLM documentation available here:

http://h2o-release.s3.amazonaws.com/h2o-dev/rel-shannon/2/docs-website/h2o-docs/index.html#Data+Science+Algorithms-GLM ,

`predict.H2OModel`

for prediction, `h2o.mse`

, `h2o.auc`

,
`h2o.confusionMatrix`

, `h2o.performance`

, `h2o.giniCoef`

, `h2o.logloss`

,
`h2o.varimp`

, `h2o.scoreHistory`

h2o.init() # Run GLM of CAPSULE ~ AGE + RACE + PSA + DCAPS prostatePath = system.file("extdata", "prostate.csv", package = "h2o") prostate.hex = h2o.importFile(path = prostatePath, destination_frame = "prostate.hex") h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"), training_frame = prostate.hex, family = "binomial", nfolds = 0, alpha = 0.5, lambda_search = FALSE) # Run GLM of VOL ~ CAPSULE + AGE + RACE + PSA + GLEASON myX = setdiff(colnames(prostate.hex), c("ID", "DPROS", "DCAPS", "VOL")) h2o.glm(y = "VOL", x = myX, training_frame = prostate.hex, family = "gaussian", nfolds = 0, alpha = 0.1, lambda_search = FALSE) # GLM variable importance # Also see: # https://github.com/h2oai/h2o/blob/master/R/tests/testdir_demos/runit_demo_VI_all_algos.R data.hex = h2o.importFile( path = "https://s3.amazonaws.com/h2o-public-test-data/smalldata/demos/bank-additional-full.csv", destination_frame = "data.hex") myX = 1:20 myY="y" my.glm = h2o.glm(x=myX, y=myY, training_frame=data.hex, family="binomial", standardize=TRUE, lambda_search=TRUE)