Perform robust discrete matrix completion with a low-rank constraint on a latent continuous matrix, implemented via an ADMM algorithm.
rdmc(
X,
values = NULL,
lambda = fraction_grid(),
relative = TRUE,
loss = c("pseudo_huber", "absolute", "truncated"),
loss_const = NULL,
type = "svd",
svd_tol = 1e-04,
rank_max = NULL,
mu = 0.1,
delta = 1.05,
conv_tol = 1e-04,
max_iter = 100L,
L = NULL,
Theta = NULL
)An object of class "rdmc" with the following components:
a numeric vector containing the values of the regularization parameter.
a numeric value with which the values of the regularization
parameter were multiplied. If relative = TRUE, the largest singular
value of the median-centered data matrix, otherwise 1.
in case of a single value of lambda, a numeric matrix
containing the predictions of the median-centered data matrix. Otherwise a
list of such matrices.
in case of a single value of lambda, an ancillary continuous
matrix used in the optimization algorithm. Otherwise a list of such
matrices.
in case of a single value of lambda, a numeric matrix
containing the discrepancy parameter, i.e., the multiplier adjusting for the
discrepancy between L and Z in the optimization algorithm.
Otherwise a list of such matrices.
a numeric vector containing the value of the objective function for each value of the regularization parameter.
a logical vector indicating whether the algorithm converged for each value of the regularization parameter.
an integer vector containing the number of iterations for each value of the regularization parameter.
in case of a single value of lambda, a numeric matrix
containing the completed (i.e., imputed) data matrix. Otherwise a list of
such matrices.
The class structure is still experimental and may change in the future. The following accessor functions are available:
get_completed() to extract the completed (i.e.,
imputed) data matrix for a specified value of the regularization
parameter,
get_lambda() to extract the values of the
regularization parameter,
get_nb_iter() to extract the number of iterations for
each value of the regularization parameter.
a matrix or data frame of discrete ratings with missing values.
an optional numeric vector giving the possible values of the
ratings. Currently, these are assumed to be the same for all columns. If
NULL, the unique values of the observed parts of X are used.
a numeric vector giving values of the regularization
parameter. See fraction_grid() for the default values.
a logical indicating whether the values of the
regularization parameter should be considered relative to a certain
reference value computed from the data at hand. If TRUE (the
default), the values of lambda are multiplied with the largest
singular value of the median-centered data matrix with missing values
replaced by zeros.
a character string specifying the robust loss function for the
loss part of the objective function. Possible values are
"pseudo_huber" (the default) for the pseudo-Huber loss,
"absolute" for the absolute loss, and "truncated" for the
truncated absolute loss. See ‘Details’ for more information.
tuning constant for the loss function. For the
pseudo-Huber loss, the default value is the average step size between the
rating categories in values. For the truncated absolute loss,
the default is half the range of the rating categories in values.
This is ignored for the absolute loss, which does not have a tuning
parameter. See ‘Details’ for more information.
a character string specifying the type of algorithm for the
low-rank latent continuous matrix. Currently only "svd" is
implemented for a soft-thresholded SVD step.
numeric tolerance for the soft-thresholded SVD step. Only
singular values larger than svd_tol are kept to construct the
low-rank latent continuous matrix.
a positive integer giving a rank constraint in the soft-thresholded SVD step for the latent continuous matrix. The default is to use the minimum of the number of rows and columns.
numeric; penalty parameter for the discrepancy between the discrete rating matrix and the latent low-rank continuous matrix. It is not recommended to change the default value of 0.1.
numeric; update factor for penalty parameter mu applied
after each iteration to increase the strength of the penalty. It is not
recommended to change the default value of 1.05.
numeric; convergence tolerance for the relative change in the objective function.
a positive integer specifying the maximum number of iterations. In practice, large gains can often be had in the first few iterations, with subsequent iterations yielding relatively small gains until convergence. Hence the default is to perform at most 10 iterations.
starting values for the algorithm. These are not expected
to be set by the user. Instead, it is recommended to call this function
with a grid of values for the regularization parameter lambda so that
the implementation automatically takes advantage of warm starts.
Andreas Alfons and Aurore Archimbaud
For the loss part of the objective function, the pseudo-Huber loss
(loss = "pseudo_huber") is given by
$$\rho(x) = \code{loss\_const}^2 (\sqrt{1 + (x/\code{loss\_const})^2} - 1).$$
The absolute loss
(loss = "absolute") is given by
$$\rho(x) = |x|,$$
and the truncated absolute loss (loss = "truncated") is defined as
$$\rho(x) = \min (|x|, \code{loss\_const}).$$
Archimbaud, A., Alfons, A., and Wilms, I. (2025) Robust Matrix Completion for Discrete Rating-Scale Data. arXiv:2412.20802. tools:::Rd_expr_doi("10.48550/arXiv.2412.20802").
rdmc_tune(), fraction_grid()
# toy example derived from MovieLens 100K dataset
data("MovieLensToy")
# robust discrete matrix completion
fit <- rdmc(MovieLensToy)
# extract completed matrix with fifth value of
# regularization parameter
X_hat <- get_completed(fit, which = 5)
head(X_hat)
Run the code above in your browser using DataLab