h2o.glrm: Generalized Low Rank Model

Description

Generalized low rank decomposition of a H2O dataset.

Usage

h2o.glrm(training_frame, x, k, destination_key, loading_key,
  transform = c("NONE", "DEMEAN", "DESCALE", "STANDARDIZE", "NORMALIZE"),
  loss = c("L2", "L1", "Huber", "Poisson", "Hinge", "Logistic"),
  regularization_x = c("L2", "L1"), regularization_y = c("L2", "L1"),
  gamma_x = 0, gamma_y = 0, max_iterations = 1000, init_step_size = 1,
  min_step_size = 0.001, init = c("Random", "PlusPlus", "SVD"),
  recover_pca = FALSE, seed)

Arguments

training_frame

An H2OFrame object containing the variables in the model.

(Optional) A vector containing the data columns on which k-means operates.

The rank of the resulting decomposition. This must be between 1 and the number of columns in the training frame, inclusive.

destination_key

(Optional) The unique hex key assigned to the resulting model. Automatically generated if none is provided.

loading_key

(Optional) The unique hex key assigned to the loading matrix X in the XY decomposition. Automatically generated if none is provided.

transform

A character string that indicates how the training data should be transformed before running PCA. Possible values are "NONE": for no transformation, "DEMEAN": for subtracting the mean of each column, "DESCALE": for dividing by the standard deviation of ea

loss

A character string indicating the loss function. Possible values are "L2", "L1", "Huber", "Poisson", "Hinge" and "Logistic".

regularization_x

A character string indicating the regularization function for the X matrix. Possible values are "L2" and "L1".

regularization_y

A character string indicating the regularization function for the Y matrix. Possible values are "L2" and "L1".

gamma_x

The weight on the X matrix regularization term. For no X regularization, set this value to zero.

gamma_y

The weight on the Y matrix regularization term. For no Y regularization, set this value to zero.

max_iterations

The maximum number of iterations to run the optimization loop. Each iteration consists of an update of the X matrix, followed by an update of the Y matrix.

init_step_size

Initial step size. Divided by number of columns in the training frame when calculating the proximal gradient update. The algorithm begins at init_step_size and decreases the step size at each iteration until a termination condition is reached.

min_step_size

Minimum step size upon which the algorithm is terminated.

init

A character string indicating how to select the initial Y matrix. Possible values are "Random": for initialization to a random array from the standard normal distribution, "PlusPlus": for initialization using the clusters from k-means++ initialization, or

recover_pca

A logical value indicating whether the principal components should be recovered during post-processing of the generalized low rank decomposition.

seed

(Optional) Random seed used to initialize the X and Y matrices.