rdmc_tune: Robust discrete matrix completion with hyperparameter tuning

Description

Perform robust discrete matrix completion with a low-rank constraint on a latent continuous matrix, implemented via an ADMM algorithm. The regularization parameter is thereby selected via repeated holdout validation or cross-validation.

Usage

rdmc_tune(
  X,
  values = NULL,
  lambda = fraction_grid(),
  relative = TRUE,
  splits = holdout_control(),
  loss = c("pseudo_huber", "absolute", "truncated"),
  loss_const = NULL,
  ...
)

Value

An object of class "rdmc_tuned" with the following components:

lambda: a numeric vector containing the values of the regularization parameter.
tuning_loss: a numeric vector containing the (average) values of the loss function on the validation set(s) for each value of the regularization parameter.
lambda_opt: numeric; the optimal value of the regularization parameter.
fit: an object of class "rdmc" containing the results from the algorithm with the optimal regularization parameter on the full (observed) data matrix.

The class structure is still experimental and may change in the future. The following accessor functions are available:

get_completed() to extract the completed (i.e., imputed) data matrix with the optimal value of the regularization parameter,
get_lambda() to extract the optimal value of the regularization parameter,
get_nb_iter() to extract the number of iterations with the optimal value of the regularization parameter.

Arguments

X: a matrix or data frame of discrete ratings with missing values.
values: an optional numeric vector giving the possible values of the ratings. Currently, these are assumed to be the same for all columns. If NULL, the unique values of the observed parts of X are used.
lambda: a numeric vector giving values of the regularization parameter. See fraction_grid() for the default values.
relative: a logical indicating whether the values of the regularization parameter should be considered relative to a certain reference value computed from the data at hand. If TRUE (the default), the values of lambda are multiplied with the largest singular value of the median-centered data matrix with missing values replaced by zeros.
splits: an object inheriting from class "split_control", as generated by holdout_control() for repeated holdout validation or cv_folds_control() for $K$-fold cross-validation, or a list of index vectors giving different validation sets of observed cells as generated by create_splits(). Cells in the validation set will be set to NA for fitting the algorithm with the training set of observed cells.
loss: a character string specifying the robust loss function for the loss part of the objective function. Possible values are "pseudo_huber" (the default) for the pseudo-Huber loss, "absolute" for the absolute loss, and "truncated" for the truncated absolute loss. See ‘Details’ for more information.
loss_const: tuning constant for the loss function. For the pseudo-Huber loss, the default value is the average step size between the rating categories in values. For the truncated absolute loss, the default is half the range of the rating categories in values. This is ignored for the absolute loss, which does not have a tuning parameter. See ‘Details’ for more information.
...: additional arguments to be passed down to rdmc().

Author

Andreas Alfons

Details

For the loss part of the objective function, the pseudo-Huber loss (loss = "pseudo_huber") is given by $$\rho(x) = \code{loss\_const}^2 (\sqrt{1 + (x/\code{loss\_const})^2} - 1).$$ The absolute loss (loss = "absolute") is given by $$\rho(x) = |x|,$$ and the truncated absolute loss (loss = "truncated") is defined as $$\rho(x) = \min (|x|, \code{loss\_const}).$$

References

Archimbaud, A., Alfons, A., and Wilms, I. (2025) Robust Matrix Completion for Discrete Rating-Scale Data. arXiv:2412.20802. tools:::Rd_expr_doi("10.48550/arXiv.2412.20802").

Examples

Run this code

# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# robust discrete matrix completion with hyperparameter tuning
set.seed(20250723)
fit <- rdmc_tune(MovieLensToy, 
                 lambda = fraction_grid(nb_lambda = 6),
                 splits = holdout_control(R = 5))
# extract completed matrix with optimal regularization parameter
X_hat <- get_completed(fit)
head(X_hat)
# extract optimal value of regularization parameter
get_lambda(fit)
# extract number of iterations with optimal regularization parameter
get_nb_iter(fit)

Run the code above in your browser using DataLab