h2o (version 2.8.4.4)

h2o.pcr: H2O: Principal Components Regression

Description

Runs GLM regression on PCA results, and allows for transformation of test data to match PCA transformations of training data.

Usage

h2o.pcr(x, y, data, key = "", ncomp, family, nfolds = 10, alpha = 0.5, lambda = 1e-05, 
  epsilon = 1e-05, tweedie.p)

Arguments

x
A vector containing the names of the predictors in the model.
y
The name of the response variable in the model.
data
An H2OParsedData object containing the variables in the model.
key
(Optional) The unique hex key assigned to the resulting model. If none is given, a key will automatically be generated.
ncomp
A number indicating the number of principal components to use in the regression model.
family
A description of the error distribution and corresponding link function to be used in the model. Currently, Gaussian, binomial, Poisson, gamma, and Tweedie are supported.
nfolds
(Optional) Number of folds for cross-validation. The default is 10.
alpha
(Optional) The elastic-net mixing parameter, which must be in [0,1]. The penalty is defined to be $$P(\alpha,\beta) = (1-\alpha)/2||\beta||_2^2 + \alpha||\beta||_1 = \sum_j [(1-\alpha)/2 \beta_j^2 + \alpha|\beta_j|]$$ so alpha=1 is the lasso
lambda
(Optional) The shrinkage parameter, which multiples $P(\alpha,\beta)$ in the objective. The larger lambda is, the more the coefficients are shrunk toward zero (and each other).
epsilon
(Optional) Number indicating the cutoff for determining if a coefficient is zero.
tweedie.p
The index of the power variance function for the tweedie distribution. Only used if family = "tweedie"

Value

  • An object of class H2OGLMModel with slots key, data, model and xval. The slot model is a list of the following components:
  • coefficientsA named vector of the coefficients estimated in the model.
  • rankThe numeric rank of the fitted linear model.
  • familyThe family of the error distribution.
  • devianceThe deviance of the fitted model.
  • aicAkaike's Information Criterion for the final computed model.
  • null.devianceThe deviance for the null model.
  • iterNumber of algorithm iterations to compute the model.
  • df.residualThe residual degrees of freedom.
  • df.nullThe residual degrees of freedom for the null model.
  • yThe response variable in the model.
  • xA vector of the predictor variable(s) in the model.
  • aucArea under the curve.
  • training.errAverage training error.
  • thresholdBest threshold.
  • confusionConfusion matrix.
  • The slot xval is a list of H2OGLMModel objects representing the cross-validation models. (Each of these objects themselves has xval equal to an empty list).

Details

This method standardizes the data, obtains the first ncomp principal components using PCA (in decreasing order of standard deviation), and then runs GLM with the components as the predictor variables.

See Also

h2o.prcomp, h2o.glm

Examples

Run this code
library(h2o)
localH2O = h2o.init()

# Run PCR on Prostate Data
prostate.hex = h2o.importURL(localH2O, path = paste("https://raw.github.com", 
  "h2oai/h2o/master/smalldata/logreg/prostate.csv", sep = "/"), key = "prostate.hex")
h2o.pcr(x = c("AGE","RACE","PSA","DCAPS"), y = "CAPSULE", data = prostate.hex, family = "binomial", 
  nfolds = 0, alpha = 0.5, ncomp = 2)

Run the code above in your browser using DataLab