Drew Schmidt

Drew Schmidt

24 packages on CRAN

2 packages on GitHub

getPass

cran
93th

Percentile

A micro-package for reading "passwords", i.e. reading user input with masking, so that the input is not displayed as it is typed. Currently we have support for 'RStudio', the command line (every OS), and any platform where 'tcltk' is present.

argon2

cran
90th

Percentile

Utilities for secure password hashing via the argon2 algorithm. It is a relatively new hashing algorithm and is believed to be very secure. The 'argon2' implementation included in the package is the reference implementation. The package also includes some utilities that should be useful for digest authentication, including a wrapper of 'blake2b'. For similar R packages, see sodium and 'bcrypt'. See <https://en.wikipedia.org/wiki/Argon2> or <https://eprint.iacr.org/2015/430.pdf> for more information.

ngram

cran
89th

Percentile

An n-gram is a sequence of n "words" taken, in order, from a body of text. This is a collection of utilities for creating, displaying, summarizing, and "babbling" n-grams. The 'tokenization' and "babbling" are handled by very efficient C code, which can even be built as its own standalone library. The babbler is a simple Markov chain. The package also offers a vignette with complete example 'workflows' and information about the utilities offered in the package.

memuse

cran
89th

Percentile

How much ram do you need to store a 100,000 by 100,000 matrix? How much ram is your current R session using? How much ram do you even have? Learn the scintillating answer to these and many more such questions with the 'memuse' package.

pbdDMAT

cran
84th

Percentile

A set of classes for managing distributed matrices, and a collection of methods for computing linear algebra and statistics. Computation is handled mostly by routines from the 'pbdBASE' package, which itself relies on the 'ScaLAPACK' and 'PBLAS' numerical libraries for distributed computing.

float

cran
80th

Percentile

R comes with a suite of utilities for linear algebra with "numeric" (double precision) vectors/matrices. However, sometimes single precision (or less!) is more than enough for a particular task. This package extends R's linear algebra facilities to include 32-bit float (single precision) data. Float vectors/matrices have half the precision of their "numeric"-type counterparts but are generally faster to numerically operate on, for a performance vs accuracy trade-off. The internal representation is an S4 class, which allows us to keep the syntax identical to that of base R's. Interaction between floats and base types for binary operators is generally possible; in these cases, type promotion always defaults to the higher precision. The package ships with copies of the single precision 'BLAS' and 'LAPACK', which are automatically built in the event they are not available on the system.

pbdBASE

cran
75th

Percentile

An interface to and extensions for the 'PBLAS' and 'ScaLAPACK' numerical libraries. This enables R to utilize distributed linear algebra for codes written in the 'SPMD' fashion. This interface is deliberately low-level and mimics the style of the native libraries it wraps. For a much higher level way of managing distributed matrices, see the 'pbdDMAT' package.

coop

cran
55th

Percentile

Fast implementations of the co-operations: covariance, correlation, and cosine similarity. The implementations are fast and memory-efficient and their use is resolved automatically based on the input data, handled by R's S3 methods. Full descriptions of the algorithms and benchmarks are available in the package vignettes.

remoter

cran
51th

Percentile

A set of utilities for client/server computing with R, controlling a remote R session (the server) from a local one (the client). Simply set up a server (see package vignette for more details) and connect to it from your local R session ('RStudio', terminal, etc). The client/server framework is a custom 'REPL' and runs entirely in your R session without the need for installing a custom environment on your system. Network communication is handled by the 'ZeroMQ' library by way of the 'pbdZMQ' package.

dequer

cran
40th

Percentile

Queues, stacks, and 'deques' are list-like, abstract data types. These are meant to be very cheap to "grow", or insert new objects into. A typical use case involves storing data in a list in a streaming fashion, when you do not necessarily know how may elements need to be stored. Unlike R's lists, the new data structures provided here are not necessarily stored contiguously, making insertions and deletions at the front/end of the structure much faster. The underlying implementation is new and uses a head/tail doubly linked list; thus, we do not rely on R's environments or hashing. To avoid unnecessary data copying, most operations on these data structures are performed via side-effects.

kazaam

cran
22th

Percentile

Many data science problems reduce to operations on very tall, skinny matrices. However, sometimes these matrices can be so tall that they are difficult to work with, or do not even fit into main memory. One strategy to deal with such objects is to distribute their rows across several processors. To this end, we offer an 'S4' class for tall, skinny, distributed matrices, called the 'shaq'. We also provide many useful numerical methods and statistics operations for operating on these distributed objects. The naming is a bit "tongue-in-cheek", with the class a play on the fact that 'Shaquille' 'ONeal' ('Shaq') is very tall, and he starred in the film 'Kazaam'.

pbdDEMO

cran
21th

Percentile

A set of demos of 'pbdR' packages, together with a useful, unifying vignette.

meanr

cran
20th

Percentile

A popular technique in text analysis today is sentiment analysis, or trying to determine the overall emotional attitude of a piece of text (positive or negative). We provide a new, basic implementation of a common method for computing sentiment, whereby words are scored as positive or negative according to a "dictionary", and then an average of those scores for the document is produced. The package uses the 'Hu' and 'Liu' sentiment dictionary for assigning sentiment.

Rdym

github
13th

Percentile

Most search engines have a "did you mean?" feature, where suggestions are given in the presence of likely typos. We are able to somewhat replicate this functionality with ancient spellchecker techniques. When R detects that a function or object listed in the user's input is not found, the package finds the minimum 'Levenshtein' distance between the "'un-found'" token and all symbols in the user's global environment plus all loaded 'namespaces'. The word with minimum 'Levenshtein' distance (in the event of ties, the first such detected) is then suggested as an alternative to the missing symbol. To use, simply load the package from an interactive R session and start making some errors. However, there is an explicit interface for starting and stopping "did you mean?" behavior.

triebeard

cran
96th

Percentile

'Radix trees', or 'tries', are key-value data structures optimised for efficient lookups, similar in purpose to hash tables. 'triebeard' provides an implementation of 'radix trees' for use in R programming and in developing packages with 'Rcpp'.

urltools

cran
96th

Percentile

A toolkit for all URL-handling needs, including encoding and decoding, parsing, parameter extraction and modification. All functions are designed to be both fast and entirely vectorised. It is intended to be useful for people dealing with web-related datasets, such as server-side logs, although may be useful for other situations involving large sets of URLs.

pbdZMQ

cran
96th

Percentile

'ZeroMQ' is a well-known library for high-performance asynchronous messaging in scalable, distributed applications. This package provides high level R wrapper functions to easily utilize 'ZeroMQ'. We mainly focus on interactive client/server programming frameworks. For convenience, a minimal 'ZeroMQ' library (4.2.2) is shipped with 'pbdZMQ', which can be used if no system installation of 'ZeroMQ' is available. A few wrapper functions compatible with 'rzmq' are also provided.

rexpokit

cran
85th

Percentile

Wraps some of the matrix exponentiation utilities from EXPOKIT (<http://www.maths.uq.edu.au/expokit/>), a FORTRAN library that is widely recommended for matrix exponentiation (Sidje RB, 1998. "Expokit: A Software Package for Computing Matrix Exponentials." ACM Trans. Math. Softw. 24(1): 130-156). EXPOKIT includes functions for exponentiating both small, dense matrices, and large, sparse matrices (in sparse matrices, most of the cells have value 0). Rapid matrix exponentiation is useful in phylogenetics when we have a large number of states (as we do when we are inferring the history of transitions between the possible geographic ranges of a species), but is probably useful in other ways as well.

rgeolocate

cran
80th

Percentile

Connectors to online and offline sources for taking IP addresses and geolocating them to country, city, timezone and other geographic ranges. For individual connectors, see the package index.

pbdMPI

cran
64th

Percentile

An efficient interface to MPI by utilizing S4 classes and methods with a focus on Single Program/Multiple Data ('SPMD') parallel programming style, which is intended for batch parallel execution.

cubfits

cran
60th

Percentile

Estimating mutation and selection coefficients on synonymous codon bias usage based on models of ribosome overhead cost (ROC). Multinomial logistic regression and Markov Chain Monte Carlo are used to estimate and predict protein production rates with/without the presence of expressions and measurement errors. Work flows with examples for simulation, estimation and prediction processes are also provided with parallelization speedup. The whole framework is tested with yeast genome and gene expression data of Yassour (2009).

pbdSLAP

cran
51th

Percentile

Utilizing scalable linear algebra packages mainly including 'BLACS', 'PBLAS', and 'ScaLAPACK' in double precision via 'pbdMPI' based on 'ScaLAPACK' version 2.0.2.

pbdRPC

cran
49th

Percentile

A very light implementation yet secure for remote procedure calls with unified interface via ssh (OpenSSH) or plink/plink.exe (PuTTY).

pbdNCDF4

cran
37th

Percentile

This package adds collective parallel read and write capability to the R package ncdf4 version 1.8. Typical use is as a parallel NetCDF4 file reader in SPMD style programming. Each R process reads and writes its own data in a synchronized collective mode, resulting in faster parallel performance. Performance improvement is conditional on a parallel file system.

pbdPROF

cran
24th

Percentile

MPI profiling tools.

rsparse

github
13th

Percentile

Implements several algorithms for supervised learning on sparse data and many matrix factorizations of sparse matrices (with a focus on applications for recommender systems). All algorithms work on sparse matrices. Also they extensively use BLAS and LAPACK and parallelized with OpenMP. Implementations are reasonably fast and nicely work with large datasets (millions of rows and millions of columns). List of algorithms for supervised learning: 1) Elastic net regression via Follow The Proximally-Regularized leader algorithm 2) Second order Factorization Machines via stochastic gradient descent with adaptive learning rates. Allows to learn model parameters out-of-core. Fast - asynchronous parallel, SIMD accelerated. List of algorithms for matrix factorization: 1) Weighted Regularazied Matrix Factorization with Alternating Least Squares (ALS) for implicit feedback (inculding approximate Conjugate Gradient solver). Optional non-negativity (NNMF, non-negative matrix factorization). 2) Regularazied Matrix Factorization with ALS for explicit feedback Optional non-negativity (NNMF, non-negative matrix factorization). 3) Fast Trunceate SVD and Soft-SVD via ALS 4) Soft-Impute via fast ALS and solution in SVD form 5) LinearFlow method which learns item-item similarity matrix from the data 6) GloVe - GlobalVectors embeddings Clustering: 1) kmeans from Armadillo library which provides smart (similar to kmeans++) cluster initializations. Misc utils/methods: 1) multithreaded `%*%` and `tcrossprod()` for `<dgRMatrix, matrix>` 2) multithreaded `%*%` and `crossprod()` for `<matrix, dgCMatrix>`