rsparse v0.4.0
Monthly downloads
Statistical Learning on Sparse Matrices
Implements many algorithms for statistical learning on
sparse matrices - matrix factorizations, matrix completion,
elastic net regressions, factorization machines.
Also 'rsparse' enhances 'Matrix' package by providing methods for
multithreaded <sparse, dense> matrix products and native slicing of
the sparse matrices in Compressed Sparse Row (CSR) format.
List of the algorithms for regression problems:
1) Elastic Net regression via Follow The Proximally-Regularized Leader (FTRL)
Stochastic Gradient Descent (SGD), as per McMahan et al(, <doi:10.1145/2487575.2488200>)
2) Factorization Machines via SGD, as per Rendle (2010, <doi:10.1109/ICDM.2010.127>)
List of algorithms for matrix factorization and matrix completion:
1) Weighted Regularized Matrix Factorization (WRMF) via Alternating Least
Squares (ALS) - paper by Hu, Koren, Volinsky (2008, <doi:10.1109/ICDM.2008.22>)
2) Maximum-Margin Matrix Factorization via ALS, paper by Rennie, Srebro
(2005, <doi:10.1145/1102351.1102441>)
3) Fast Truncated Singular Value Decomposition (SVD), Soft-Thresholded SVD,
Soft-Impute matrix completion via ALS - paper by Hastie, Mazumder
et al. (2014, <arXiv:1410.2596>)
4) Linear-Flow matrix factorization, from 'Practical linear models for
large-scale one-class collaborative filtering' by Sedhain, Bui, Kawale et al
(2016, ISBN:978-1-57735-770-4)
5) GlobalVectors (GloVe) matrix factorization via SGD, paper by Pennington,
Socher, Manning (2014, <https://www.aclweb.org/anthology/D14-1162>)
Package is reasonably fast and memory efficient - it allows to work with large
datasets - millions of rows and millions of columns. This is particularly useful
for practitioners working on recommender systems.
Readme
rsparse 
rsparse
is an R package for statistical learning primarily on sparse matrices - matrix factorizations, factorization machines, out-of-core regression. Many of the implemented algorithms are particularly useful for recommender systems and NLP.
On top of that we provide some optimized routines to work on sparse matrices - multithreaded Matrix::RsparseMatrix
).
We've paid some attention to the implementation details - we try to avoid data copies, utilize multiple threads via OpenMP and use SIMD where appropriate. Package allows to work on datasets with millions of rows and millions of columns.
Support
Please reach us if you need commercial support - hello@rexy.ai.
Features
Classification/Regression
- Follow the proximally-regularized leader which llows to solve very large linear/logistic regression problems with elastic-net penalty. Solver use with stochastic gradient descend with adaptive learning rates (so can be used for online learning - not necessary to load all data to RAM). See Ad Click Prediction: a View from the Trenches for more examples.
- Only logistic regerssion implemented at the moment
- Native format for matrices is CSR -
Matrix::RsparseMatrix
. However common RMatrix::CpasrseMatrix
(dgCMatrix
) will be converted automatically.
- Factorization Machines supervised learning algorithm which learns second order polynomial interactions in a factorized way. We provide highly optimized SIMD accelerated implementation.
Matrix Factorizations
- Vanilla Maximum Margin Matrix Factorization - classic approch for "rating" prediction. See
WRMF
class and constructor optionfeedback = "explicit"
. Original paper which indroduced MMMF could be found here. - Weighted Regularized Matrix Factorization (WRMF) from Collaborative Filtering for Implicit Feedback Datasets. See
WRMF
class and constructor optionfeedback = "implicit"
. We provide 2 solvers:- Exact based of Cholesky Factorization
- Approximated based on fixed number of steps of Conjugate Gradient. See details in Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering and Faster Implicit Matrix Factorization.
- Linear-Flow from Practical Linear Models for Large-Scale One-Class Collaborative Filtering. Algorithm looks for factorized low-rank item-item similarity matrix (in some sense it is similar to SLIM)
- Fast Truncated SVD and Truncated Soft-SVD via Alternating Least Squares as described in Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares. Works for both sparse and dense matrices. Works on float matrices as well! For certain problems may be even faster than irlba package.
- Soft-Impute via fast Alternating Least Squares as described in Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares.
- with a solution in SVD form
- GloVe as described in GloVe: Global Vectors for Word Representation.
- This is usually used to train word embeddings, but actually also very useful for recommender systems.
- Matrix scaling as descibed in EigenRec: Generalizing PureSVD for Effective and Efficient Top-N Recommendations
Optimized matrix operations
- multithreaded
%*%
andtcrossprod()
for<dgRMatrix, matrix>
- multithreaded
%*%
andcrossprod()
for<matrix, dgCMatrix>
- natively slice
CSR
matrices (Matrix::RsparseMatrix
) without converting them to triplet / CSC
Installation
Most of the algorithms benefit from OpenMP and many of them could utilize high-performance implementation of BLAS. If you want make maximum out of the package please read the section below carefuly.
It is recommended to:
- Use high-performance BLAS (such as OpenBLAS, MKL, Apple Accelerate).
- Add proper compiler optimizations in your
~/.R/Makevars
. For example on recent processors (with AVX support) and complier with OpenMP support following lines could be a good option:CXX11FLAGS += -O3 -march=native -mavx -fopenmp -ffast-math CXXFLAGS += -O3 -march=native -mavx -fopenmp -ffast-math
If you are on Mac follow instructions here. After installation of clang4
additionally put PKG_CXXFLAGS += -DARMA_USE_OPENMP
line to your ~/.R/Makevars
. After that install rsparse
in a usual way.
Materials
Note that syntax is these posts/slides is not up to date since package was under active development
- Slides from DataFest Tbilisi(2017-11-16)
- Introduction to matrix factorization with Weighted-ALS algorithm - collaborative filtering for implicit feedback datasets.
- Music recommendations using LastFM-360K dataset
- evaluation metrics for ranking
- setting up proper cross-validation
- possible issues with nested parallelism and thread contention
- making recommendations for new users
- complimentary item-to-item recommendations
- Benchmark against other good implementations
Here is example of rsparse::WRMF
on lastfm360k dataset in comparison with other good implementations:
API
We follow mlapi conventions.
Configure
Generate configure:
autoconf configure.ac > configure && chmod +x configure
Functions in rsparse
Name | Description | |
ScaleNormalize | Re-scales input matrix proportinally to item popularity | |
WRMF | Weighted Regularized Matrix Facrtorization for collaborative filtering | |
FactorizationMachine | Second order Factorization Machines | |
FTRL | Logistic regression model with FTRL proximal SGD solver. | |
metrics | Ranking Metrics for Top-K Items | |
movielens100k | MovieLens 100K Dataset | |
GloVe | Global Vectors | |
LinearFlow | Linear-FLow model for one-class collaborative filtering | |
train_test_split | Creates cross-validation set from user-item interactions | |
detect_number_omp_threads | Detects number of OpenMP threads in the system | |
matmult | Multithreaded Sparse-Dense Matrix Multiplication | |
slice | CSR Matrices Slicing | |
PureSVD | PureSVD recommender model decompomposition | |
MatrixFactorizationRecommender | Base class for matrix factorization recommenders | |
soft_impute | SoftImpute/SoftSVD matrix factorization | |
No Results! |
Last month downloads
Details
Type | Package |
License | GPL (>= 2) |
Encoding | UTF-8 |
LazyData | true |
ByteCompile | true |
LinkingTo | Rcpp, RcppArmadillo (>= 0.9.100.5.0) |
StagedInstall | TRUE |
URL | https://github.com/rexyai/rsparse |
BugReports | https://github.com/rexyai/rsparse/issues |
RoxygenNote | 7.1.0 |
NeedsCompilation | yes |
Packaged | 2020-04-01 16:55:58 UTC; dselivanov |
Repository | CRAN |
Date/Publication | 2020-04-01 17:50:02 UTC |
suggests | covr , testthat |
imports | data.table (>= 1.10.0) , float (>= 0.2-2) , lgr (>= 0.2) , Matrix (>= 1.2) , Rcpp (>= 0.11) , RhpcBLASctl |
depends | methods , R (>= 3.6.0) |
linkingto | RcppArmadillo (>= 0.9.100.5.0) |
Contributors | Drew Schmidt, Wei-Chen Chen |
Include our badge in your README
[](http://www.rdocumentation.org/packages/rsparse)