h2o (version 3.2.0.3)

h2o.glrm: Generalized Low Rank Model

Description

Generalized low rank decomposition of a H2O dataset.

Usage

h2o.glrm(training_frame, x, k, model_id, validation_frame, loading_name,
  ignore_const_cols, transform = c("NONE", "DEMEAN", "DESCALE", "STANDARDIZE",
  "NORMALIZE"), loss = c("Quadratic", "L1", "Huber", "Poisson", "Hinge",
  "Logistic"), multi_loss = c("Categorical", "Ordinal"), loss_by_col = NULL,
  loss_by_col_idx = NULL, regularization_x = c("None", "Quadratic", "L2",
  "L1", "NonNegative", "OneSparse", "UnitOneSparse", "Simplex"),
  regularization_y = c("None", "Quadratic", "L2", "L1", "NonNegative",
  "OneSparse", "UnitOneSparse", "Simplex"), gamma_x = 0, gamma_y = 0,
  max_iterations = 1000, init_step_size = 1, min_step_size = 0.001,
  init = c("Random", "PlusPlus", "SVD"), recover_svd = FALSE, seed)

Arguments

training_frame
An H2OFrame object containing the variables in the model.
x
(Optional) A vector containing the data columns on which k-means operates.
k
The rank of the resulting decomposition. This must be between 1 and the number of columns in the training frame, inclusive.
model_id
(Optional) The unique id assigned to the resulting model. If none is given, an id will automatically be generated.
validation_frame
An H2OFrame object containing the variables in the model.
loading_name
(Optional) The unique name assigned to the loading matrix X in the XY decomposition. Automatically generated if none is provided.
ignore_const_cols
(Optional) A logical value indicating whether to ignore constant columns in the training frame. A column is constant if all of its non-missing values are the same value.
transform
A character string that indicates how the training data should be transformed before running PCA. Possible values are "NONE": for no transformation, "DEMEAN": for subtracting the mean of each column, "DESCALE": for dividing by the standard deviation of ea
loss
A character string indicating the default loss function for numeric columns. Possible values are "Quadratic" (default), "L1", "Huber", "Poisson", "Hinge" and "Logistic".
multi_loss
A character string indicating the default loss function for enum columns. Possible values are "Categorical" and "Ordinal".
loss_by_col
A vector of strings indicating the loss function for specific columns by corresponding index in loss_by_col_idx. Will override loss for numeric columns and multi_loss for enum columns.
loss_by_col_idx
A vector of column indices to which the corresponding loss functions in loss_by_col are assigned. Must be zero indexed.
regularization_x
A character string indicating the regularization function for the X matrix. Possible values are "None" (default), "Quadratic", "L2", "L1", "NonNegative", "OneSparse", "UnitOneSparse", and "Simplex".
regularization_y
A character string indicating the regularization function for the Y matrix. Possible values are "None" (default), "Quadratic", "L2", "L1", "NonNegative", "OneSparse", "UnitOneSparse", and "Simplex".
gamma_x
The weight on the X matrix regularization term.
gamma_y
The weight on the Y matrix regularization term.
max_iterations
The maximum number of iterations to run the optimization loop. Each iteration consists of an update of the X matrix, followed by an update of the Y matrix.
init_step_size
Initial step size. Divided by number of columns in the training frame when calculating the proximal gradient update. The algorithm begins at init_step_size and decreases the step size at each iteration until a termination condition is reached.
min_step_size
Minimum step size upon which the algorithm is terminated.
init
A character string indicating how to select the initial Y matrix. Possible values are "Random": for initialization to a random array from the standard normal distribution, "PlusPlus": for initialization using the clusters from k-means++ initialization, or
recover_svd
A logical value indicating whether the singular values and eigenvectors should be recovered during post-processing of the generalized low rank decomposition.
seed
(Optional) Random seed used to initialize the X and Y matrices.

Value

  • Returns an object of class H2ODimReductionModel.

References

M. Udell, C. Horn, R. Zadeh, S. Boyd (2014). {Generalized Low Rank Models}[http://arxiv.org/abs/1410.0342]. Unpublished manuscript, Stanford Electrical Engineering Department.

Examples

Run this code
library(h2o)
localH2O <- h2o.init()
ausPath <- system.file("extdata", "australia.csv", package="h2o")
australia.hex <- h2o.uploadFile(localH2O, path = ausPath)
h2o.glrm(training_frame = australia.hex, k = 5, loss = "Quadratic", regularization_x = "L1",
         gamma_x = 0.5, gamma_y = 0, max_iterations = 1000)

Run the code above in your browser using DataLab