# h2o.glm

##### H2O Generalized Linear Models

Fit a generalized linear model, specified by a response variable, a set of predictors, and a description of the error distribution.

##### Usage

```
h2o.glm(x, y, training_frame, model_id, validation_frame = NULL,
ignore_const_cols = TRUE, max_iterations = 50, beta_epsilon = 0,
solver = c("IRLSM", "L_BFGS"), standardize = TRUE,
family = c("gaussian", "binomial", "poisson", "gamma", "tweedie",
"multinomial"), link = c("family_default", "identity", "logit", "log",
"inverse", "tweedie"), tweedie_variance_power = NaN,
tweedie_link_power = NaN, alpha = 0.5, prior = NULL, lambda = 1e-05,
lambda_search = FALSE, nlambdas = -1, lambda_min_ratio = -1,
nfolds = 0, fold_column = NULL, fold_assignment = c("AUTO", "Random",
"Modulo"), keep_cross_validation_predictions = FALSE,
beta_constraints = NULL, offset_column = NULL, weights_column = NULL,
intercept = TRUE, max_active_predictors = -1, objective_epsilon = -1,
gradient_epsilon = -1, non_negative = FALSE, compute_p_values = FALSE,
remove_collinear_columns = FALSE, max_runtime_secs = 0,
missing_values_handling = c("MeanImputation", "Skip"))
```

##### Arguments

- x
- A vector containing the names or indices of the predictor variables to use in building the GLM model.
- y
- A character string or index that represent the response variable in the model.
- training_frame
- An H2OFrame object containing the variables in the model.
- model_id
- (Optional) The unique id assigned to the resulting model. If none is given, an id will automatically be generated.
- validation_frame
- An H2OFrame object containing the variables in the model. Defaults to NULL.
- ignore_const_cols
- A logical value indicating whether or not to ignore all the constant columns in the training frame.
- max_iterations
- A non-negative integer specifying the maximum number of iterations.
- beta_epsilon
- A non-negative number specifying the magnitude of the maximum difference between the coefficient estimates from successive iterations.
Defines the convergence criterion for
`h2o.glm`

. - solver
- A character string specifying the solver used: IRLSM (supports more features), L_BFGS (scales better for datasets with many columns)
- standardize
- A logical value indicating whether the numeric predictors should be standardized to have a mean of 0 and a variance of 1 prior to training the models.
- family
- A character string specifying the distribution of the model: gaussian, binomial, poisson, gamma, tweedie.
- link
- A character string specifying the link function. The default is the canonical link for the
`family`

. The supported links for each of the`family`

specifications are:`"gaussian"`

:`"identity"`

,`"log"`

- tweedie_variance_power
- A numeric specifying the power for the variance function when
`family = "tweedie"`

. - tweedie_link_power
- A numeric specifying the power for the link function when
`family = "tweedie"`

. - alpha
- A numeric in [0, 1] specifying the elastic-net mixing parameter.
The elastic-net penalty is defined to be:
$$P(\alpha,\beta) = (1-\alpha)/2||\beta||_2^2 + \alpha||\beta||_1 = \sum_j [(1-\alpha)/2 \beta_j^2 + \alpha|\beta_j|]$$
making
`alpha = 1`

- prior
- (Optional) A numeric specifying the prior probability of class 1 in the response when
`family = "binomial"`

. The default prior is the observational frequency of class 1. Must be from (0,1) exclusive range or NULL (no prior). - lambda
- A non-negative shrinkage parameter for the elastic-net, which multiplies $P(\alpha,\beta)$ in the objective function.
When
`lambda = 0`

, no elastic-net penalty is applied and ordinary generalized linear models are fit. - lambda_search
- A logical value indicating whether to conduct a search over the space of lambda values starting from the lambda max, given
`lambda`

is interpreted as lambda min. - nlambdas
- The number of lambda values to use when
`lambda_search = TRUE`

. - lambda_min_ratio
- Smallest value for lambda as a fraction of lambda.max. By default if the number of observations is greater than the
the number of variables then
`lambda_min_ratio`

= 0.0001; if the number of observations is less than the number of variables the - nfolds
- (Optional) Number of folds for cross-validation. If
`nfolds >= 2`

, then`validation`

must remain empty. - fold_column
- (Optional) Column with cross-validation fold index assignment per observation.
- fold_assignment
- Cross-validation fold assignment scheme, if fold_column is not specified Must be "AUTO", "Random" or "Modulo".
- keep_cross_validation_predictions
- Whether to keep the predictions of the cross-validation models.
- beta_constraints
- A data.frame or H2OParsedData object with the columns ["names", "lower_bounds", "upper_bounds", "beta_given", "rho"], where each row corresponds to a predictor in the GLM. "names" contains the predictor names, "lower_bounds" and "upper_bounds" are the low
- offset_column
- Specify the offset column.
- weights_column
- Specify the weights column.
- intercept
- Logical, include constant term (intercept) in the model.
- max_active_predictors
- (Optional) Convergence criteria for number of predictors when using L1 penalty.
- objective_epsilon
- Convergence criteria. Converge if relative change in objective function is below this threshold.
- gradient_epsilon
- Convergence criteria. Converge if gradient l-infinity norm is below this threshold.
- non_negative
- Logical, allow only positive coefficients.
- compute_p_values
- (Optional) Logical, compute p-values, only allowed with IRLSM solver and no regularization. May fail if there are collinear predictors.
- remove_collinear_columns
- (Optional) Logical, valid only with no regularization. If set, co-linear columns will be automatically ignored (coefficient will be 0).
- max_runtime_secs
- Maximum allowed runtime in seconds for model training. Use 0 to disable.
- missing_values_handling
- (Optional) Controls handling of missing values. Can be either "MeanImputation" or "Skip". MeanImputation replaces missing values with mean for numeric and most frequent level for categorical, Skip ignores observations with any missing value. Applied both
- ...
- (Currently Unimplemented) coefficients.

##### Value

- A subclass of

is returned. The specific subclass depends on the machine learning task at hand (if it's binomial classification, then anH2OModel

is returned, if it's regression then aH2OBinomialModel

is returned). The default print-out of the models is shown, but further GLM-specifc information can be queried out of the object. To access these various items, please refer to the seealso section below.H2ORegressionModel Upon completion of the GLM, the resulting object has coefficients, normalized coefficients, residual/null deviance, aic, and a host of model metrics including MSE, AUC (for logistic regression), degrees of freedom, and confusion matrices. Please refer to the more in-depth GLM documentation available here:

http://h2o-release.s3.amazonaws.com/h2o-dev/rel-shannon/2/docs-website/h2o-docs/index.html#Data+Science+Algorithms-GLM ,

##### See Also

`predict.H2OModel`

for prediction, `h2o.mse`

, `h2o.auc`

,
`h2o.confusionMatrix`

, `h2o.performance`

, `h2o.giniCoef`

, `h2o.logloss`

,
`h2o.varimp`

, `h2o.scoreHistory`

##### Examples

```
h2o.init()
# Run GLM of CAPSULE ~ AGE + RACE + PSA + DCAPS
prostatePath = system.file("extdata", "prostate.csv", package = "h2o")
prostate.hex = h2o.importFile(path = prostatePath, destination_frame = "prostate.hex")
h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"), training_frame = prostate.hex,
family = "binomial", nfolds = 0, alpha = 0.5, lambda_search = FALSE)
# Run GLM of VOL ~ CAPSULE + AGE + RACE + PSA + GLEASON
myX = setdiff(colnames(prostate.hex), c("ID", "DPROS", "DCAPS", "VOL"))
h2o.glm(y = "VOL", x = myX, training_frame = prostate.hex, family = "gaussian",
nfolds = 0, alpha = 0.1, lambda_search = FALSE)
# GLM variable importance
# Also see:
# https://github.com/h2oai/h2o/blob/master/R/tests/testdir_demos/runit_demo_VI_all_algos.R
data.hex = h2o.importFile(
path = "https://s3.amazonaws.com/h2o-public-test-data/smalldata/demos/bank-additional-full.csv",
destination_frame = "data.hex")
myX = 1:20
myY="y"
my.glm = h2o.glm(x=myX, y=myY, training_frame=data.hex, family="binomial", standardize=TRUE,
lambda_search=TRUE)
```

*Documentation reproduced from package h2o, version 3.8.1.3, License: Apache License (== 2.0)*

### Community examples

**violar@mail.tau.ac.il**at Feb 13, 2019 h2o v3.8.1.3

#standard GLM with H2O library(h2o) h2o.init(nthreads=-1) training_data_table<-h2o.importFile("C:/Folder/DT.csv", header = TRUE, sep = ",") my_glm = h2o.glm(y="column_a", x=c("column_b","column_c"), training_frame = training_data_table) #All the other arguments have defaults