soft_impute_tune: Matrix completion via nuclear-norm regularization with hyperparameter tuning

Description

Perform matrix completion via nuclear-norm regularization based on softImpute(). The regularization parameter is thereby selected via repeated holdout validation or cross-validation. Note that this uses the convenience wrapper soft_impute(), whose default behavior is different from that of the original function.

Usage

soft_impute_tune(
  X,
  lambda = fraction_grid(reverse = TRUE),
  relative = TRUE,
  splits = holdout_control(),
  ...,
  discretize = TRUE,
  values = NULL
)

Value

An object of class "soft_impute_tuned" with the following components:

lambda: a numeric vector containing the values of the regularization parameter.
tuning_loss: a numeric vector containing the (average) values of the loss function on the validation set(s) for each value of the regularization parameter.
lambda_opt: numeric; the optimal value of the regularization parameter.
fit: an object of class "soft_impute" containing the results from the algorithm with the optimal regularization parameter on the full (observed) data matrix.

The class structure is still experimental and may change in the future. The following accessor functions are available:

get_completed() to extract the imputed data matrix (with the optimal value of the regularization parameter),
get_lambda() to extract the optimal value of the regularization parameter.

Arguments

X: a matrix or data frame with missing values.
lambda: a numeric vector giving values of the regularization parameter. See fraction_grid() for the default values.
relative: a logical indicating whether the values of the regularization parameter should be considered relative to a certain reference value computed from the data at hand. If TRUE (the default), the values of lambda are multiplied with the value returned by lambda0() (applied to the mean-centered data matrix).
splits: an object inheriting from class "split_control", as generated by holdout_control() for repeated holdout validation or cv_folds_control() for \(K\)-fold cross-validation, or a list of index vectors giving different validation sets of observed cells as generated by create_splits(). Cells in the validation set will be set to NA for fitting the algorithm with the training set of observed cells.
...: additional arguments to be passed down to soft_impute().
discretize: a logical indicating whether to include a discretization step after fitting the algorithm (defaults to TRUE). In case of discrete rating-scale data, this can be used to map the imputed values to the discrete rating scale of the observed values.
values: an optional numeric vector giving the possible values of discrete ratings. This is ignored if discretize is FALSE. Currently, the possible values are assumed to be the same for all columns. If NULL, the unique values of the observed parts of X are used.

Author

Andreas Alfons

References

Hastie, T., Mazumder, R., Lee, J. D. and Zadeh, R. (2015) Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares. Journal of Machine Learning Research, 16(104), 3367--3402.

Mazumder, R., Hastie, T. and Tibshirani, R. (2010) Spectral Regularization Algorithms for Learning Large Incomplete Matrices. Journal of Machine Learning Research, 11(80), 2287--2322.

Examples

Run this code

# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# Soft-Impute with discretization step and hyperparameter tuning
set.seed(20250723)
fit <- soft_impute_tune(MovieLensToy, 
                        lambda = fraction_grid(nb_lambda = 6, 
                                               reverse = TRUE),
                        splits = holdout_control(R = 5))
# extract discretized completed matrix with optimal 
# regularization parameter
X_hat <- get_completed(fit)
head(X_hat)
# extract optimal value of regularization parameter
get_lambda(fit)

Run the code above in your browser using DataLab