PureSVD

Creates matrix factorization model based on Soft-SVD.
Soft SVD is very similar to truncated SVD with ability do add regularization
based on nuclear norm.

datasets

Implements many algorithms for statistical learning on
sparse matrices - matrix factorizations, matrix completion,
elastic net regressions, factorization machines.
Also 'rsparse' enhances 'Matrix' package by providing methods for
multithreaded <sparse, dense> matrix products and native slicing of
the sparse matrices in Compressed Sparse Row (CSR) format.
List of the algorithms for regression problems:
1) Elastic Net regression via Follow The Proximally-Regularized Leader (FTRL)
Stochastic Gradient Descent (SGD), as per McMahan et al(, <doi:10.1145/2487575.2488200>)
2) Factorization Machines via SGD, as per Rendle (2010, <doi:10.1109/ICDM.2010.127>)
List of algorithms for matrix factorization and matrix completion:
1) Weighted Regularized Matrix Factorization (WRMF) via Alternating Least
Squares (ALS) - paper by Hu, Koren, Volinsky (2008, <doi:10.1109/ICDM.2008.22>)
2) Maximum-Margin Matrix Factorization via ALS, paper by Rennie, Srebro
(2005, <doi:10.1145/1102351.1102441>)
3) Fast Truncated Singular Value Decomposition (SVD), Soft-Thresholded SVD,
Soft-Impute matrix completion via ALS - paper by Hastie, Mazumder
et al. (2014, <arXiv:1410.2596>)
4) Linear-Flow matrix factorization, from 'Practical linear models for
large-scale one-class collaborative filtering' by Sedhain, Bui, Kawale et al
(2016, ISBN:978-1-57735-770-4)
5) GlobalVectors (GloVe) matrix factorization via SGD, paper by Pennington,
Socher, Manning (2014, <https://www.aclweb.org/anthology/D14-1162>)
Package is reasonably fast and memory efficient - it allows to work with large
datasets - millions of rows and millions of columns. This is particularly useful
for practitioners working on recommender systems.

Dmitriy Selivanov

rsparse

Statistical Learning on Sparse Matrices

Drew Schmidt

Wei-Chen Chen

PureSVD function

Format

For usage details see Methods, Arguments and Examples sections.<pre>
 model = PureSVD$new(rank = 10L,
 lambda = 0,
 init = NULL,
 preprocess = identity,
 ...)
 model$fit_transform(x, n_iter = 5L, ...)
 model$transform(x, ...)
 model$predict(x, k, not_recommend = x, ...)
 model$components
</pre>

Usage

<dl class="dl-horizontal">
 <dt><code>$new(rank = 10L, lambda = 0,
 init = NULL,
 preprocess = identity,
 ...
 ) </code></dt><dd>creates matrix
 factorization model model with at most <code>rank</code> latent factors. If <code>init</code> is not null then initializes
 with provided SVD solution</dd>
 <dt><code>$fit_transform(x, n_iter = 5L, ...)</code></dt><dd>fits model to
 an input user-item matrix.
 Returns factor matrix for users of size <code>n_users * rank</code></dd>
 <dt><code>$transform(x, ...)</code></dt><dd>Calculates user embeddings from given <code>x</code> user-item matrix.
 Result is <code>n_users * rank</code> matrix</dd>
 <dt><code>$predict(x, k, not_recommend = x, ...)</code></dt><dd>predict <code>top k</code>
 item ids for users <code>x</code> (= column names from the matrix passed to <code>fit_transform()</code> method).
 Users features should be defined the same way as they were defined in training data - as sparse matrix
 of confidence values (implicit feedback) or ratings (explicit feedback).
 Column names (=item ids) should be in the same order as in the <code>fit_transform()</code>.</dd>
 <dt><code>$components</code></dt><dd>item factors matrix of size <code>rank * n_items</code></dd>
</dl>

Methods

<dl class="dl-horizontal">
 <dt>model</dt><dd>A <code>PureSVD</code> model.</dd>
 <dt>x</dt><dd>An input sparse user-item matrix(of class <code>dgCMatrix</code>)</dd>.
 <dt>rank</dt><dd><code>integer</code> - maximum number of latent factors</dd>
 <dt>lambda</dt><dd><code>numeric</code> - regularization parameter for nuclear norm</dd>
 <dt>preprocess</dt><dd><code>function</code> = <code>identity()</code> by default. User spectified function which will be applied to user-item interaction matrix
 before running matrix factorization (also applied in inference time before making predictions). For example we may
 want to normalize each row of user-item matrix to have 1 norm. Or apply <code>log1p()</code> to discount large counts.</dd>
 <dt>not_recommend</dt><dd><code>sparse matrix</code> or <code>NULL</code> - points which items should be excluided from recommendations for a user.
 By default it excludes previously seen/consumed items.</dd>
 <dt>convergence_tol</dt><dd><code>numeric = -Inf</code> defines early stopping strategy. We stop fitting
 when one of two following conditions will be satisfied: (a) we have used
 all iterations, or (b) relative change of frobenious norm of the two consequent solution is less then
 provided <code>convergence_tol</code></dd>
 <dt>...</dt><dd>other arguments. Not used at the moment</dd>
</dl>

PureSVD: Soft-SVD decompomposition

Description

Usage

Format

Usage

Methods

Arguments

Examples