h2o.glrm: Generalized Low Rank Model

Description

Generalized low rank decomposition of a H2O dataset.

Usage

h2o.glrm(training_frame, x, k, model_id, validation_frame, loading_name,
  ignore_const_cols, transform = c("NONE", "DEMEAN", "DESCALE", "STANDARDIZE",
  "NORMALIZE"), loss = c("Quadratic", "L1", "Huber", "Poisson", "Hinge",
  "Logistic"), multi_loss = c("Categorical", "Ordinal"), loss_by_col = NULL,
  loss_by_col_idx = NULL, regularization_x = c("None", "Quadratic", "L2",
  "L1", "NonNegative", "OneSparse", "UnitOneSparse", "Simplex"),
  regularization_y = c("None", "Quadratic", "L2", "L1", "NonNegative",
  "OneSparse", "UnitOneSparse", "Simplex"), gamma_x = 0, gamma_y = 0,
  max_iterations = 1000, init_step_size = 1, min_step_size = 0.001,
  init = c("Random", "PlusPlus", "SVD"), recover_svd = FALSE, seed)

Arguments

training_frame

An H2OFrame object containing the variables in the model.

(Optional) A vector containing the data columns on which k-means operates.

The rank of the resulting decomposition. This must be between 1 and the number of columns in the training frame, inclusive.

model_id

(Optional) The unique id assigned to the resulting model. If none is given, an id will automatically be generated.

validation_frame

An H2OFrame object containing the variables in the model.

loading_name

(Optional) The unique name assigned to the loading matrix X in the XY decomposition. Automatically generated if none is provided.

ignore_const_cols

(Optional) A logical value indicating whether to ignore constant columns in the training frame. A column is constant if all of its non-missing values are the same value.

transform

A character string that indicates how the training data should be transformed before running PCA. Possible values are "NONE": for no transformation, "DEMEAN": for subtracting the mean of each column, "DESCALE": for dividing by the standard deviation of ea

loss

A character string indicating the default loss function for numeric columns. Possible values are "Quadratic" (default), "L1", "Huber", "Poisson", "Hinge" and "Logistic".

multi_loss

A character string indicating the default loss function for enum columns. Possible values are "Categorical" and "Ordinal".

loss_by_col

A vector of strings indicating the loss function for specific columns by corresponding index in loss_by_col_idx. Will override loss for numeric columns and multi_loss for enum columns.

loss_by_col_idx

A vector of column indices to which the corresponding loss functions in loss_by_col are assigned. Must be zero indexed.

regularization_x

A character string indicating the regularization function for the X matrix. Possible values are "None" (default), "Quadratic", "L2", "L1", "NonNegative", "OneSparse", "UnitOneSparse", and "Simplex".

regularization_y

A character string indicating the regularization function for the Y matrix. Possible values are "None" (default), "Quadratic", "L2", "L1", "NonNegative", "OneSparse", "UnitOneSparse", and "Simplex".

gamma_x

The weight on the X matrix regularization term.

gamma_y

The weight on the Y matrix regularization term.

max_iterations

The maximum number of iterations to run the optimization loop. Each iteration consists of an update of the X matrix, followed by an update of the Y matrix.

init_step_size

Initial step size. Divided by number of columns in the training frame when calculating the proximal gradient update. The algorithm begins at init_step_size and decreases the step size at each iteration until a termination condition is reached.

min_step_size

Minimum step size upon which the algorithm is terminated.

init

A character string indicating how to select the initial Y matrix. Possible values are "Random": for initialization to a random array from the standard normal distribution, "PlusPlus": for initialization using the clusters from k-means++ initialization, or

recover_svd

A logical value indicating whether the singular values and eigenvectors should be recovered during post-processing of the generalized low rank decomposition.

seed

(Optional) Random seed used to initialize the X and Y matrices.

Value

Returns an object of class H2ODimReductionModel.

References

M. Udell, C. Horn, R. Zadeh, S. Boyd (2014). {Generalized Low Rank Models}[http://arxiv.org/abs/1410.0342]. Unpublished manuscript, Stanford Electrical Engineering Department.

Examples

Run this code

library(h2o)
localH2O <- h2o.init()
ausPath <- system.file("extdata", "australia.csv", package="h2o")
australia.hex <- h2o.uploadFile(localH2O, path = ausPath)
h2o.glrm(training_frame = australia.hex, k = 5, loss = "Quadratic", regularization_x = "L1",
         gamma_x = 0.5, gamma_y = 0, max_iterations = 1000)

Run the code above in your browser using DataLab