Creates matrix factorization model which could be solved with Alternating Least Squares (Weighted ALS for implicit feedback). For implicit feedback see (Hu, Koren, Volinsky)'2008 paper http://yifanhu.net/PUB/cf.pdf. For explicit feedback model is classic model for rating matrix decomposition with MSE error (without biases at the moment). These two algorithms are proven to work well in recommender systems.
WRMF
R6Class
object.
For usage details see Methods, Arguments and Examples sections.
model = WRMF$new(rank = 10L, lambda = 0, feedback = c("implicit", "explicit"), non_negative = FALSE, solver = c("conjugate_gradient", "cholesky"), cg_steps = 3L, init = NULL) model$fit_transform(x, n_iter = 5L, ...) model$transform(x) model$predict(x, k, not_recommend = x, items_exclude = NULL, ...) model$components model$remove_scorer(name)
$new(rank = 10L, lambda = 0, feedback = c("implicit", "explicit"),
non_negative = FALSE,
solver = c("conjugate_gradient", "cholesky"), cg_steps = 3L,
init = NULL)
creates matrix
factorization model model with rank
latent factors. If init
is provided then initialize
item embeddings with its values.
$fit_transform(x, n_iter = 5L, ...)
fits model to
an input user-item matrix. (preferably in "dgCMatrix" format).
For implicit feedback x
should be a confidence matrix which corresponds to 1 + alpha * r_ui
in original paper.
Usually r_ui
corresponds to the number of interactions of user u
and item i
.
For explicit feedback values in x
represents ratings.
Returns factor matrix for users of size n_users * rank
$transform(x, ...)
Calculates user embeddings from given x
user-item matrix.
Result is n_users * rank
matrix
$predict(x, k, not_recommend = x, ...)
predicts top k
item indices for users x
. Additionally contains scores
attribute - "score"
values for each prediction. If model contains item ids (input matrix to fit_transform()
had column-names
then result additionally will have ids
attribute - item ids which
correspond to item indices.
Users features x
should be defined the same way as they were defined in training data -
as sparse matrix of confidence values (implicit feedback) or ratings (explicit feedback).
Column names (=item ids) should be in the same order as in the fit_transform()
.
$components
items embeddings matrix of size rank * n_items
A WRMF
model.
An input sparse user-item matrix(of class dgCMatrix
).
For explicit feedback should consists of ratings.
For implicit feedback all positive interactions should be filled with confidence values.
Missed interactions should me zeros/empty.
So for simple case case when confidence = 1 + alpha * x
integer
- number of latent factors
numeric
- regularization parameter
character
- feedback type - one of c("implicit", "explicit")
character
- solver for "implicit feedback" problem.
One of c("conjugate_gradient", "cholesky")
.
Usually approximate "conjugate_gradient"
is significantly faster and solution is
on par with exact "cholesky"
integer > 0
- max number of internal steps in conjugate gradient
(if "conjugate_gradient" solver used). cg_steps = 3
by default.
Controls precision of linear equation solution at the each ALS step. Usually no need to tune this parameter.
function
= identity()
by default. User spectified function which will be applied to user-item interaction matrix
before running matrix factorization (also applied in inference time before making predictions). For example we may
want to normalize each row of user-item matrix to have 1 norm. Or apply log1p()
to discount large counts.
This essentially corresponds to the "confidence" function from (Hu, Koren, Volinsky)'2008 paper http://yifanhu.net/PUB/cf.pdf
one of c("double", "float")
. Should embeeding matrices be usual numeric or
float (from float
package). The latter is usually 2x faster and consumes less RAM. BUT float
matrices
are not "base" objects. Use carefully.
sparse matrix
or NULL
- points which items should be excluided from recommendations for a user.
By default it excludes previously seen/consumed items.
character
= item ids or integer
= item indices or NULL
-
items to exclude from recommendations for all users.
numeric = -Inf
defines early stopping strategy. Model stops fitting
when one of two following conditions is satisfied: (a) exceed number of iterations,
or (b) loss_previous_iter / loss_current_iter - 1 < convergence_tol
other arguments. Not used at the moment
# NOT RUN {
data('movielens100k')
train = movielens100k[1:900, ]
cv = movielens100k[901:nrow(movielens100k), ]
model = WRMF$new(rank = 5, lambda = 0, feedback = 'implicit')
user_emb = model$fit_transform(train, n_iter = 5, convergence_tol = -1)
item_emb = model$components
preds = model$predict(cv, k = 10, not_recommend = cv)
# }
Run the code above in your browser using DataLab