rdmc: Robust discrete matrix completion

Description

Perform robust discrete matrix completion with a low-rank constraint on a latent continuous matrix, implemented via an ADMM algorithm.

Usage

rdmc(
  X,
  values = NULL,
  lambda = fraction_grid(),
  relative = TRUE,
  loss = c("pseudo_huber", "absolute", "truncated"),
  loss_const = NULL,
  type = "svd",
  svd_tol = 1e-04,
  rank_max = NULL,
  mu = 0.1,
  delta = 1.05,
  conv_tol = 1e-04,
  max_iter = 100L,
  L = NULL,
  Theta = NULL
)

Value

An object of class "rdmc" with the following components:

lambda: a numeric vector containing the values of the regularization parameter.
d_max: a numeric value with which the values of the regularization parameter were multiplied. If relative = TRUE, the largest singular value of the median-centered data matrix, otherwise 1.
L: in case of a single value of lambda, a numeric matrix containing the predictions of the median-centered data matrix. Otherwise a list of such matrices.
Z: in case of a single value of lambda, an ancillary continuous matrix used in the optimization algorithm. Otherwise a list of such matrices.
Theta: in case of a single value of lambda, a numeric matrix containing the discrepancy parameter, i.e., the multiplier adjusting for the discrepancy between L and Z in the optimization algorithm. Otherwise a list of such matrices.
objective: a numeric vector containing the value of the objective function for each value of the regularization parameter.
converged: a logical vector indicating whether the algorithm converged for each value of the regularization parameter.
nb_iter: an integer vector containing the number of iterations for each value of the regularization parameter.
X: in case of a single value of lambda, a numeric matrix containing the completed (i.e., imputed) data matrix. Otherwise a list of such matrices.

The class structure is still experimental and may change in the future. The following accessor functions are available:

get_completed() to extract the completed (i.e., imputed) data matrix for a specified value of the regularization parameter,
get_lambda() to extract the values of the regularization parameter,
get_nb_iter() to extract the number of iterations for each value of the regularization parameter.

Arguments

X: a matrix or data frame of discrete ratings with missing values.
values: an optional numeric vector giving the possible values of the ratings. Currently, these are assumed to be the same for all columns. If NULL, the unique values of the observed parts of X are used.
lambda: a numeric vector giving values of the regularization parameter. See fraction_grid() for the default values.
relative: a logical indicating whether the values of the regularization parameter should be considered relative to a certain reference value computed from the data at hand. If TRUE (the default), the values of lambda are multiplied with the largest singular value of the median-centered data matrix with missing values replaced by zeros.
loss: a character string specifying the robust loss function for the loss part of the objective function. Possible values are "pseudo_huber" (the default) for the pseudo-Huber loss, "absolute" for the absolute loss, and "truncated" for the truncated absolute loss. See ‘Details’ for more information.
loss_const: tuning constant for the loss function. For the pseudo-Huber loss, the default value is the average step size between the rating categories in values. For the truncated absolute loss, the default is half the range of the rating categories in values. This is ignored for the absolute loss, which does not have a tuning parameter. See ‘Details’ for more information.
type: a character string specifying the type of algorithm for the low-rank latent continuous matrix. Currently only "svd" is implemented for a soft-thresholded SVD step.
svd_tol: numeric tolerance for the soft-thresholded SVD step. Only singular values larger than svd_tol are kept to construct the low-rank latent continuous matrix.
rank_max: a positive integer giving a rank constraint in the soft-thresholded SVD step for the latent continuous matrix. The default is to use the minimum of the number of rows and columns.
mu: numeric; penalty parameter for the discrepancy between the discrete rating matrix and the latent low-rank continuous matrix. It is not recommended to change the default value of 0.1.
delta: numeric; update factor for penalty parameter mu applied after each iteration to increase the strength of the penalty. It is not recommended to change the default value of 1.05.
conv_tol: numeric; convergence tolerance for the relative change in the objective function.
max_iter: a positive integer specifying the maximum number of iterations. In practice, large gains can often be had in the first few iterations, with subsequent iterations yielding relatively small gains until convergence. Hence the default is to perform at most 10 iterations.
L, Theta: starting values for the algorithm. These are not expected to be set by the user. Instead, it is recommended to call this function with a grid of values for the regularization parameter lambda so that the implementation automatically takes advantage of warm starts.

Author

Andreas Alfons and Aurore Archimbaud

Details

For the loss part of the objective function, the pseudo-Huber loss (loss = "pseudo_huber") is given by $$\rho(x) = \code{loss\_const}^2 (\sqrt{1 + (x/\code{loss\_const})^2} - 1).$$ The absolute loss (loss = "absolute") is given by $$\rho(x) = |x|,$$ and the truncated absolute loss (loss = "truncated") is defined as $$\rho(x) = \min (|x|, \code{loss\_const}).$$

References

Archimbaud, A., Alfons, A., and Wilms, I. (2025) Robust Matrix Completion for Discrete Rating-Scale Data. arXiv:2412.20802. tools:::Rd_expr_doi("10.48550/arXiv.2412.20802").

Examples

Run this code

# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# robust discrete matrix completion
fit <- rdmc(MovieLensToy)
# extract completed matrix with fifth value of 
# regularization parameter
X_hat <- get_completed(fit, which = 5)
head(X_hat)

Run the code above in your browser using DataLab