Wei-Chen Chen

Wei-Chen Chen

22 packages on CRAN

1 packages on GitHub

pbdZMQ

cran
96th

Percentile

'ZeroMQ' is a well-known library for high-performance asynchronous messaging in scalable, distributed applications. This package provides high level R wrapper functions to easily utilize 'ZeroMQ'. We mainly focus on interactive client/server programming frameworks. For convenience, a minimal 'ZeroMQ' library (4.2.2) is shipped with 'pbdZMQ', which can be used if no system installation of 'ZeroMQ' is available. A few wrapper functions compatible with 'rzmq' are also provided.

EMCluster

cran
82th

Percentile

EM algorithms and several efficient initialization methods for model-based clustering of finite mixture Gaussian distribution with unstructured dispersion in both of unsupervised and semi-supervised learning.

phyclust

cran
81th

Percentile

Phylogenetic clustering (phyloclustering) is an evolutionary Continuous Time Markov Chain model-based approach to identify population structure from molecular data without assuming linkage equilibrium. The package phyclust (Chen 2011) provides a convenient implementation of phyloclustering for DNA and SNP data, capable of clustering individuals into subpopulations and identifying molecular sequences representative of those subpopulations. It is designed in C for performance, interfaced with R for visualization, and incorporates other popular open source programs including ms (Hudson 2002) <doi:10.1093/bioinformatics/18.2.337>, seq-gen (Rambaut and Grassly 1997) <doi:10.1093/bioinformatics/13.3.235>, Hap-Clustering (Tzeng 2005) <doi:10.1002/gepi.20063> and PAML baseml (Yang 1997, 2007) <doi:10.1093/bioinformatics/13.5.555>, <doi:10.1093/molbev/msm088>, for simulating data, additional analyses, and searching the best tree. See the phyclust website for more information, documentations and examples.

MixSim

cran
70th

Percentile

The utility of this package is in simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of 'MixSim', there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models.

QZ

cran
67th

Percentile

Generalized eigenvalues and QZ decomposition (generalized Schur form) for an N-by-N non-symmetric matrix A or paired matrices (A,B) with eigenvalues reordering mechanism. The package is mainly based complex*16 and double precision of LAPACK library (version 3.4.2.)

pbdMPI

cran
64th

Percentile

An efficient interface to MPI by utilizing S4 classes and methods with a focus on Single Program/Multiple Data ('SPMD') parallel programming style, which is intended for batch parallel execution.

cubfits

cran
60th

Percentile

Estimating mutation and selection coefficients on synonymous codon bias usage based on models of ribosome overhead cost (ROC). Multinomial logistic regression and Markov Chain Monte Carlo are used to estimate and predict protein production rates with/without the presence of expressions and measurement errors. Work flows with examples for simulation, estimation and prediction processes are also provided with parallelization speedup. The whole framework is tested with yeast genome and gene expression data of Yassour (2009).

pbdSLAP

cran
51th

Percentile

Utilizing scalable linear algebra packages mainly including 'BLACS', 'PBLAS', and 'ScaLAPACK' in double precision via 'pbdMPI' based on 'ScaLAPACK' version 2.0.2.

pmclust

cran
51th

Percentile

Aims to utilize model-based clustering (unsupervised) for high dimensional and ultra large data, especially in a distributed manner. The code employs 'pbdMPI' to perform a expectation-gathering-maximization algorithm for finite mixture Gaussian models. The unstructured dispersion matrices are assumed in the Gaussian models. The implementation is default in the single program multiple data programming model. The code can be executed through 'pbdMPI' and MPI' implementations such as 'OpenMPI' and 'MPICH'. See the High Performance Statistical Computing website <https://snoweye.github.io/hpsc/> for more information, documents and examples.

pbdRPC

cran
49th

Percentile

A very light implementation yet secure for remote procedure calls with unified interface via ssh (OpenSSH) or plink/plink.exe (PuTTY).

MixfMRI

cran
24th

Percentile

Utilizing model-based clustering (unsupervised) for functional magnetic resonance imaging (fMRI) data. The developed methods (Chen and Maitra (2018, manuscript)) include 2D and 3D clustering analyses (for p-values with voxel locations) and segmentation analyses (for p-values alone) for fMRI data where p-values indicate significant level of activation responding to stimulate of interesting. The analyses are mainly identifying active voxel/signal associated with normal brain behaviors. Analysis pipelines (R scripts) utilizing this package (see examples in 'inst/workflow/') is also implemented with high performance techniques.

pbdPROF

cran
24th

Percentile

MPI profiling tools.

getPass

cran
93th

Percentile

A micro-package for reading "passwords", i.e. reading user input with masking, so that the input is not displayed as it is typed. Currently we have support for 'RStudio', the command line (every OS), and any platform where 'tcltk' is present.

memuse

cran
89th

Percentile

How much ram do you need to store a 100,000 by 100,000 matrix? How much ram is your current R session using? How much ram do you even have? Learn the scintillating answer to these and many more such questions with the 'memuse' package.

pbdDMAT

cran
84th

Percentile

A set of classes for managing distributed matrices, and a collection of methods for computing linear algebra and statistics. Computation is handled mostly by routines from the 'pbdBASE' package, which itself relies on the 'ScaLAPACK' and 'PBLAS' numerical libraries for distributed computing.

float

cran
80th

Percentile

R comes with a suite of utilities for linear algebra with "numeric" (double precision) vectors/matrices. However, sometimes single precision (or less!) is more than enough for a particular task. This package extends R's linear algebra facilities to include 32-bit float (single precision) data. Float vectors/matrices have half the precision of their "numeric"-type counterparts but are generally faster to numerically operate on, for a performance vs accuracy trade-off. The internal representation is an S4 class, which allows us to keep the syntax identical to that of base R's. Interaction between floats and base types for binary operators is generally possible; in these cases, type promotion always defaults to the higher precision. The package ships with copies of the single precision 'BLAS' and 'LAPACK', which are automatically built in the event they are not available on the system.

pbdBASE

cran
75th

Percentile

An interface to and extensions for the 'PBLAS' and 'ScaLAPACK' numerical libraries. This enables R to utilize distributed linear algebra for codes written in the 'SPMD' fashion. This interface is deliberately low-level and mimics the style of the native libraries it wraps. For a much higher level way of managing distributed matrices, see the 'pbdDMAT' package.

remoter

cran
51th

Percentile

A set of utilities for client/server computing with R, controlling a remote R session (the server) from a local one (the client). Simply set up a server (see package vignette for more details) and connect to it from your local R session ('RStudio', terminal, etc). The client/server framework is a custom 'REPL' and runs entirely in your R session without the need for installing a custom environment on your system. Network communication is handled by the 'ZeroMQ' library by way of the 'pbdZMQ' package.

45th

Percentile

Provides a full implementation of the 'Jupyter' <http://jupyter.org/> messaging protocol in C++ by leveraging 'Rcpp' and 'Xeus' <https://github.com/QuantStack/xeus>. 'Jupyter' supplies an interactive computing environment and a messaging protocol defined over 'ZeroMQ' for multiple programming languages. This package implements the 'Jupyter' kernel interface so that 'R' is exposed to this interactive computing environment. 'ZeroMQ' functionality is provided by the 'pbdZMQ' package. 'Xeus' is a C++ library that facilitates the implementation of kernels for 'Jupyter'. Additionally, 'Xeus' provides an interface to libraries that exist in the 'Jupyter' ecosystem for building widgets, plotting, and more <https://blog.jupyter.org/interactive-workflows-for-c-with-jupyter-fe9b54227d92>. 'JuniperKernel' uses 'Xeus' as a library for the 'Jupyter' messaging protocol.

pbdNCDF4

cran
37th

Percentile

This package adds collective parallel read and write capability to the R package ncdf4 version 1.8. Typical use is as a parallel NetCDF4 file reader in SPMD style programming. Each R process reads and writes its own data in a synchronized collective mode, resulting in faster parallel performance. Performance improvement is conditional on a parallel file system.

kazaam

cran
22th

Percentile

Many data science problems reduce to operations on very tall, skinny matrices. However, sometimes these matrices can be so tall that they are difficult to work with, or do not even fit into main memory. One strategy to deal with such objects is to distribute their rows across several processors. To this end, we offer an 'S4' class for tall, skinny, distributed matrices, called the 'shaq'. We also provide many useful numerical methods and statistics operations for operating on these distributed objects. The naming is a bit "tongue-in-cheek", with the class a play on the fact that 'Shaquille' 'ONeal' ('Shaq') is very tall, and he starred in the film 'Kazaam'.

pbdDEMO

cran
21th

Percentile

A set of demos of 'pbdR' packages, together with a useful, unifying vignette.

rsparse

github
13th

Percentile

Implements several algorithms for supervised learning on sparse data and many matrix factorizations of sparse matrices (with a focus on applications for recommender systems). All algorithms work on sparse matrices. Also they extensively use BLAS and LAPACK and parallelized with OpenMP. Implementations are reasonably fast and nicely work with large datasets (millions of rows and millions of columns). List of algorithms for supervised learning: 1) Elastic net regression via Follow The Proximally-Regularized leader algorithm 2) Second order Factorization Machines via stochastic gradient descent with adaptive learning rates. Allows to learn model parameters out-of-core. Fast - asynchronous parallel, SIMD accelerated. List of algorithms for matrix factorization: 1) Weighted Regularazied Matrix Factorization with Alternating Least Squares (ALS) for implicit feedback (inculding approximate Conjugate Gradient solver). Optional non-negativity (NNMF, non-negative matrix factorization). 2) Regularazied Matrix Factorization with ALS for explicit feedback Optional non-negativity (NNMF, non-negative matrix factorization). 3) Fast Trunceate SVD and Soft-SVD via ALS 4) Soft-Impute via fast ALS and solution in SVD form 5) LinearFlow method which learns item-item similarity matrix from the data 6) GloVe - GlobalVectors embeddings Clustering: 1) kmeans from Armadillo library which provides smart (similar to kmeans++) cluster initializations. Misc utils/methods: 1) multithreaded `%*%` and `tcrossprod()` for `<dgRMatrix, matrix>` 2) multithreaded `%*%` and `crossprod()` for `<matrix, dgCMatrix>`