soft_impute: Matrix completion via nuclear-norm regularization

Description

Convenience wrapper for softImpute() that allows to supply a grid of values for the regularization parameter. Other noteworthy differences with the original function are that the columns of the data matrix are centered internally, that some of the default values are different, and that the output is structured differently. Moreover, in case of discrete rating-scale data, the wrapper function allows to include a discretization step after fitting the algorithm to map the imputed values to the rating scale of the observed values.

Usage

soft_impute(
  X,
  lambda = fraction_grid(reverse = TRUE),
  relative = TRUE,
  type = c("svd", "als"),
  rank.max = NULL,
  thresh = 1e-05,
  maxit = 100L,
  trace.it = FALSE,
  final.svd = TRUE,
  discretize = TRUE,
  values = NULL
)

Value

An object of class "soft_impute" with the following components:

lambda: a numeric vector containing the values of the regularization parameter.
lambda0: a numeric value with which the values of the regularization parameter were multiplied. If relative = TRUE, the value returned by lambda0() (applied to the mean-centered data matrix), otherwise 1.
svd: in case of a single value of lambda, an object returned by softImpute(). Otherwise a list of such objects.
X: in case of a single value of lambda, a numeric matrix containing the completed (i.e., imputed) data matrix. Otherwise a list of such matrices.
X_discretized: in case of a single value of lambda, a numeric matrix containing the completed (i.e., imputed) data matrix after the discretization step. Otherwise a list of such matrices. This is only returned if requested via discretize = TRUE.

The class structure is still experimental and may change in the future. The following accessor functions are available:

get_completed() to extract the completed (i.e., imputed) data matrix for a specified value of the regularization parameter,
get_lambda() to extract the values of the regularization parameter.

Arguments

X: a matrix or data frame with missing values.
lambda: a numeric vector giving values of the regularization parameter. See fraction_grid() for the default values.
relative: a logical indicating whether the values of the regularization parameter should be considered relative to a certain reference value computed from the data at hand. If TRUE (the default), the values of lambda are multiplied with the value returned by lambda0() (applied to the mean-centered data matrix).
type: a character string specifying the type of algorithm. Possible values are "svd" and "als". See softImpute() for details on the algorithms, but note that the default value here is "svd".
rank.max: a positive integer giving a rank constraint. See softImpute() for more details, but note that the default here is to use the minimum of the number of rows and columns minus 1 if type is "svd", and to use 2 if type is "als".
thresh, maxit, trace.it, final.svd: see softImpute().
discretize: a logical indicating whether to include a discretization step after fitting the algorithm (defaults to TRUE). In case of discrete rating-scale data, this can be used to map the imputed values to the discrete rating scale of the observed values.
values: an optional numeric vector giving the possible values of discrete ratings. This is ignored if discretize is FALSE. Currently, the possible values are assumed to be the same for all columns. If NULL, the unique values of the observed parts of X are used.

Author

Andreas Alfons and Aurore Archimbaud

References

Hastie, T., Mazumder, R., Lee, J. D. and Zadeh, R. (2015) Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares. Journal of Machine Learning Research, 16(104), 3367--3402.

Mazumder, R., Hastie, T. and Tibshirani, R. (2010) Spectral Regularization Algorithms for Learning Large Incomplete Matrices. Journal of Machine Learning Research, 11(80), 2287--2322.

Examples

Run this code

# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# Soft-Impute with discretization step
fit <- soft_impute(MovieLensToy)
# extract discretized completed matrix with fifth value 
# of regularization parameter
X_hat <- get_completed(fit, which = 5)
head(X_hat)

Run the code above in your browser using DataLab