Learn R Programming

⚠️There's a newer version (1.2.3) of this package.Take me there.

chickn

R package {chickn} implements Chromatogram Hierarchical Compressive K-means with Nystrom approximation clustering approach. It allows processing a large collection of high-resolution mass spectrometry signals (chromatographic profiles) relying on the compressed data version (data sketch). The Filebacked Big Matrix (FBM) class from the bigstatsr package is used to store and to manipulate matrices, which cannot be load in memory.

Pre-requisites

R version >= 3.5, bigstatsr >= 1.2.3

OS tested: Ubuntu

chickn R package dependencies: foreach, doRNG, parallel, doParallel, Rcpp, RcppArmadillo, RcppParallel, nloptr, pracma, stats, MASS,zipfR, mvnfast, rmio, Rdpack

Installation

You can install {chickn} from GitLab with:

devtools::install_gitlab(repo = "Olga.Permiakova/chickn")

Typical installation error

If the Rdpack package instalation had non-zero exit status, you should first install the development version of Rdpack from Github using:

devtools::install_github("GeoBosh/Rdpack")

then proceed with the {chickn} package installation.

Example

This is a basic example which demonstrates how the complete {chickn} pipeline can be applied to a small part of the UPS2 data (toy dataset provided in the package):

library(chickn)
data(UPS2)
n = nrow(UPS2)
N = ncol(UPS2)
## Convert data into FBM format, a .bk file is generated in the working directory
UPS2_FBM = bigstatsr::FBM(init = UPS2, ncol=N, nrow = n, backingfile = "UPS2_FBM")$save()
## Apply main chickn function. 
output  <- CHICKN_W1(Data = UPS2_FBM, K = 2, k_total =32, 
                     max_neighbors = 10,DIR_output = getwd(), DIR_tmp = tempfile(),
                     ncores = 4, N0 = N, DoPreimage = TRUE)
## NB: The generated output files (NystromKernel, Centroids_out, Cluster_assign_out) are stored in DIR_output directory.                      

Citing Our work

The published article of the project will be linked here.

Copy Link

Version

Install

install.packages('chickn')

Monthly Downloads

47

Version

1.2.2

License

GPL (>= 2)

Maintainer

Olga Permiakova

Last Published

November 12th, 2020

Functions in chickn (1.2.2)

DBindex_BIG

DBindex_BIG
GenerateFrequencies

Frequency vector construction
CHICKN_W1

Chromatogram Hierarchical Compressive K-means with Nystrom approximation
E_parallel

Euclidean distance
DrawFreq

Draw frequency vectors
EstimSigma

Data variance estimation
COMPR

Compressive Orthogonal Matching Pursuit with Replacement
G_fun_cpp

G_fun_cpp
DB_funR

DB_funR
DBindex

DBindex
Sketch

Sketch
hcc_parallel

hcc_parallel
Nystrom_kernel

Nystrom kernel approximation
gamma_estimation

Kernel parameter estimation
IntraDist

Computation of the Wasserstein distances between cluster centroid and signals in the cluster
InterDist

Computation of the Wasserstein distances between cluster centroids
W1_cl_centr

Parallel computation of the Wasserstein distances between cluster centroid and points in the cluster
ObjFun_OMP_cpp

Objective function
W1_cl_centr_BIG

W1_cl_centr_BIG
Gradient_OMP_cpp

Gradient OMP
UPS2

UPS2 dataset
W1_parallel

Wasserstein-1 distance
split_index

split_index
chickn

chickn-package
Gradient_cpp

Gradient and objective function
cumsum_Mat

cumsum_Mat
split_foreach

split_foreach
big_parallelize

big_parallelize
cumsum_parallel

Cumulative sum computation
cumsum_sug

cumsum_sug
W1_centr_centr_BIG

Parallel computation for big clusters of the Wasserstein distances between cluster centroids
W1_centr_centr

Parallel computation of the Wasserstein distances between cluster centroids
Preimage

Preimage
RandIndex

RandIndex