# Dmitriy Selivanov

#### 11 packages on CRAN

#### 1 packages on GitHub

Provides 'R6' abstract classes for building machine learning models with 'scikit-learn' like API. <http://scikit-learn.org/> is a popular module for 'Python' programming language which design became de facto a standard in industry for machine learning tasks.

Allows to easily create high-performance full featured HTTP APIs from R functions. Provides high-level classes such as 'Request', 'Response', 'Application', 'Middleware' in order to streamline server side application development. Out of the box allows to serve requests using 'Rserve' package, but flexible enough to integrate with other HTTP servers such as 'httpuv'.

This R package provides an interface to the NoSQL MongoDB database using the MongoDB C-driver version 0.8.

Implements many (sparse) matrix factorization algorithms. Focus is on applications for recommender systems. List of algorithms: 1) Weighted Regularazied Matrix Factorization with Alternating Least Squares (ALS) for implicit feedback (inculding approximate Conjugate Gradient solver). Optional non-negativity (NNMF, non-negative matrix factorization). 2) Regularazied Matrix Factorization with ALS for explicit feedback Optional non-negativity (NNMF, non-negative matrix factorization). 3) Soft-SVD via fast ALS 4) Soft-Impute via fast ALS and solution in SVD form 5) LinearFlow method which learns item-item similarity matrix from the data All algorithms work on sparse matrices. Extensively use BLAS and LAPACK and parallelized with OpenMP where applicable. Implementations are reasonably fast and nicely work with large datasets (millions of rows and millions of columns).

Implements many algorithms for statistical learning on sparse matrices - matrix factorizations, matrix completion, elastic net regressions, factorization machines. Also 'rsparse' enhances 'Matrix' package by providing methods for multithreaded <sparse, dense> matrix products and native slicing of the sparse matrices in Compressed Sparse Row (CSR) format. List of the algorithms for regression problems: 1) Elastic Net regression via Follow The Proximally-Regularized Leader (FTRL) Stochastic Gradient Descent (SGD), as per McMahan et al(, <doi:10.1145/2487575.2488200>) 2) Factorization Machines via SGD, as per Rendle (2010, <doi:10.1109/ICDM.2010.127>) List of algorithms for matrix factorization and matrix completion: 1) Weighted Regularized Matrix Factorization (WRMF) via Alternating Least Squares (ALS) - paper by Hu, Koren, Volinsky (2008, <doi:10.1109/ICDM.2008.22>) 2) Maximum-Margin Matrix Factorization via ALS, paper by Rennie, Srebro (2005, <doi:10.1145/1102351.1102441>) 3) Fast Truncated Singular Value Decomposition (SVD), Soft-Thresholded SVD, Soft-Impute matrix completion via ALS - paper by Hastie, Mazumder et al. (2014, <arXiv:1410.2596>) 4) Linear-Flow matrix factorization, from 'Practical linear models for large-scale one-class collaborative filtering' by Sedhain, Bui, Kawale et al (2016, ISBN:978-1-57735-770-4) 5) GlobalVectors (GloVe) matrix factorization via SGD, paper by Pennington, Socher, Manning (2014, <https://www.aclweb.org/anthology/D14-1162>) Package is reasonably fast and memory efficient - it allows to work with large datasets - millions of rows and millions of columns. This is particularly useful for practitioners working on recommender systems.

Provides interface to 'sparsepp' - fast, memory efficient hash map. It is derived from Google's excellent 'sparsehash' implementation. We believe 'sparsepp' provides an unparalleled combination of performance and memory usage, and will outperform your compiler's unordered_map on both counts. Only Google's 'dense_hash_map' is consistently faster, at the cost of much greater memory usage (especially when the final size of the map is not known in advance).

Fast 'SVMlight' reader and writer. 'SVMlight' is most commonly used format for storing sparse matrices (possibly with some target variable) on disk. For additional information about 'SVMlight' format see <http://svmlight.joachims.org/>.

Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.

R comes with a suite of utilities for linear algebra with "numeric" (double precision) vectors/matrices. However, sometimes single precision (or less!) is more than enough for a particular task. This package extends R's linear algebra facilities to include 32-bit float (single precision) data. Float vectors/matrices have half the precision of their "numeric"-type counterparts but are generally faster to numerically operate on, for a performance vs accuracy trade-off. The internal representation is an S4 class, which allows us to keep the syntax identical to that of base R's. Interaction between floats and base types for binary operators is generally possible; in these cases, type promotion always defaults to the higher precision. The package ships with copies of the single precision 'BLAS' and 'LAPACK', which are automatically built in the event they are not available on the system.

'Hnswlib' is a C++ library for Approximate Nearest Neighbors. This package provides a minimal R interface by relying on the 'Rcpp' package. See <https://github.com/nmslib/hnswlib> for more on 'hnswlib'. 'hnswlib' is released under Version 2.0 of the Apache License.

Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, tweets, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for fast yet correct tokenization in 'UTF-8'.

Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <arXiv:1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.