h2o (version 3.10.3.6)

h2o.prcomp: Principal components analysis of an H2O data frame using the power method

Description

Principal components analysis of an H2O data frame using the power method to calculate the singular value decomposition of the Gram matrix.

Usage

h2o.prcomp(training_frame, x, model_id = NULL, validation_frame = NULL,
  ignore_const_cols = TRUE, score_each_iteration = FALSE,
  transform = c("NONE", "STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE"),
  pca_method = c("GramSVD", "Power", "Randomized", "GLRM"), k = 1,
  max_iterations = 1000, use_all_factor_levels = FALSE,
  compute_metrics = TRUE, impute_missing = FALSE, seed = -1,
  max_runtime_secs = 0)

Arguments

training_frame
Id of the training data frame (Not required, to allow initial validation of model parameters).
x
A vector containing the character names of the predictors in the model.
model_id
Destination id for this model; auto-generated if not specified.
validation_frame
Id of the validation data frame.
ignore_const_cols
Logical. Ignore constant columns. Defaults to TRUE.
score_each_iteration
Logical. Whether to score during each iteration of model training. Defaults to FALSE.
transform
Transformation of training data Must be one of: "NONE", "STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE". Defaults to NONE.
pca_method
Method for computing PCA (Caution: Power and GLRM are currently experimental and unstable) Must be one of: "GramSVD", "Power", "Randomized", "GLRM". Defaults to GramSVD.
k
Rank of matrix approximation Defaults to 1.
max_iterations
Maximum training iterations Defaults to 1000.
use_all_factor_levels
Logical. Whether first factor level is included in each categorical expansion Defaults to FALSE.
compute_metrics
Logical. Whether to compute metrics on the training data Defaults to TRUE.
impute_missing
Logical. Whether to impute missing entries with the column mean Defaults to FALSE.
seed
Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) Defaults to -1 (time-based random number).
max_runtime_secs
Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0.

Value

Returns an object of class .

References

N. Halko, P.G. Martinsson, J.A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions[http://arxiv.org/abs/0909.4061]. SIAM Rev., Survey and Review section, Vol. 53, num. 2, pp. 217-288, June 2011.

See Also

h2o.svd, h2o.glrm

Examples

Run this code
library(h2o)
h2o.init()
ausPath <- system.file("extdata", "australia.csv", package="h2o")
australia.hex <- h2o.uploadFile(path = ausPath)
h2o.prcomp(training_frame = australia.hex, k = 8, transform = "STANDARDIZE")

Run the code above in your browser using DataLab