rsparse (version 0.3.3.2)

LinearFlow: Creates Linear-FLow model for one-class collaborative filtering

Description

Creates Linear-FLow model described in Practical Linear Models for Large-Scale One-Class Collaborative Filtering. The goal is to find item-item (or user-user) similarity matrix which is low-rank and has small Frobenius norm. Such double regularization allows to better control the generalization error of the model. Idea of the method is somewhat similar to Sparse Linear Methods(SLIM) but scales to large datasets much better.

Usage

LinearFlow

Format

R6Class object.

Usage

For usage details see Methods, Arguments and Examples sections.

  model = LinearFlow$new( rank = 8L,
                          lambda = 0,
                          init = NULL,
                          preprocess = identity,
                          solve_right_singular_vectors = c("soft_impute", "svd")
                          ...)
  model$fit_transform(x, ...)
  model$transform(x, ...)
  model$predict(x, k, not_recommend = x, ...)
  model$components
  model$v
  model$cross_validate_lambda(x, x_train, x_test, lambda = "auto@10",
                       metric = "map@10", not_recommend = x_train, ...)

Methods

$new(rank = 8L, lambda = 0, init = NULL, preprocess = identity, solve_right_singular_vectors = c("svd", "soft_impute"), ...)

creates Linear-FLow model with rank latent factors. If init (right singular vectors of the user-item interactions matrix) is provided then model initialized with its values.

$fit_transform(x, ...)

fits model to an input user-item interaction matrix. Returns user embeddings matrix of the size n_users * rank

$transform(x, ...)

transforms user-item interaction matrix into user-embeddings matrix.

$predict(x, k, not_recommend = x, ...)

predicts top k item ids for users x. Users features should be defined the same way as they were defined in training data - as sparse matrix. Column names (=item ids) should be in the same order as in the fit_transform().

preprocess

function = identity() by default. User spectified function which will be applied to user-item interaction matrix before running matrix factorization (also applied in inference time before making predictions).

$cross_validate_lambda(x, x_train, x_test, lambda = "auto@10", metric = "map@10", not_recommend = x_train, ...)

perfroms search of the best regularization parameter lambda:

  1. Model is trained on x data

  2. Then model makes predictions based on x_train data

  3. And finally these predications are validated using specified metric against x_test data

Note that this is implemented smartly with "warm starts". So it is very cheap - cost is almost the same as for single fit of the model. The only considerable additional cost is time to predict top k items. In most cases automatic lambda like lambda = "auto@20" is able to find good value of the parameter

$components

item factors matrix of size rank * n_items. In the paper this matrix is called Y

$v

right singular vector of the user-item matrix. Size is n_items * rank. In the paper this matrix is called v

Arguments

model

A LinearFlow model.

x

An input sparse user-item matrix (inherits from sparseMatrix)

rank

integer - number of latent factors

lambda

numeric - regularization parameter or sequence of regularization values for cross_validate_lambda method.

not_recommend

sparse matrix or NULL - points which items should be excluided from recommendations for a user. By default it excludes previously seen/consumed items.

metric

metric to use in evaluation of top-k recommendations. Currently only map@k and ndcg@k are supported (k can be any integer).

...

other arguments (not used at the moment)

See Also

Examples

Run this code
# NOT RUN {
data('movielens100k')
train = movielens100k[1:900, ]
cv = movielens100k[901:nrow(movielens100k), ]
model = LinearFlow$new(rank = 10, lambda = 0, init = NULL,
                       solve_right_singular_vectors = "svd")
user_emb = model$fit_transform(train)
preds = model$predict(cv, k = 10)
# }

Run the code above in your browser using DataCamp Workspace