#### 42 packages on CRAN

The main function biclust() provides several algorithms to find biclusters in two-dimensional data: Cheng and Church (2000, ISBN:1-57735-115-0), spectral (2003) <doi:10.1101/gr.648603>, plaid model (2005) <doi:10.1016/j.csda.2004.02.003>, xmotifs (2003) <doi:10.1142/9789812776303_0008> and bimax (2006) <doi:10.1093/bioinformatics/btl060>. In addition, the package provides methods for data preprocessing (normalization and discretisation), visualisation, and validation of bicluster solutions.

cghseg is an R package dedicated to the analysis of CGH profiles using segmentation models.

PCIC's implementation of Climdex routines for computation of extreme climate indices.

Various data sets used in examples and exercises in the book Maindonald, J.H. and Braun, W.J. (2003, 2007, 2010) "Data Analysis and Graphics Using R".

Useful when reading the book above mentioned, in the documentation referred to as `the book'.

Provide the implementation of a family of Lasso variants including Dantzig Selector, LAD Lasso, SQRT Lasso, Lq Lasso for estimating high dimensional sparse linear model. We adopt the alternating direction method of multipliers and convert the original optimization problem into a sequential L1 penalized least square minimization problem, which can be efficiently solved by linearization algorithm. A multi-stage screening approach is adopted for further acceleration. Besides the sparse linear model estimation, we also provide the extension of these Lasso variants to sparse Gaussian graphical model estimation including TIGER and CLIME using either L1 or adaptive penalty. Missing values can be tolerated for Dantzig selector and CLIME. The computation is memory-optimized using the sparse matrix output.

This package proposes a model-based clustering algorithm for multivariate functional data. The parametric mixture model, based on the assumption of normality of the principal components resulting from a multivariate functional PCA, is estimated by an EM-like algorithm. The main advantage of the proposed algorithm is its ability to take into account the dependence among curves.

Quantification of the effect of geographic versus environmental isolation on genetic differentiation

Graph estimation in Gaussian Graphical Models. The main functions return the adjacency matrix of an undirected graph estimated from a data matrix.

This package provides a minimalistic functionality necessary to apply Gaussian Process in R. They provide a selection of functionalities of GPML Matlab library.

The package implements efficient ways to evaluate and solve equations of the form Ax=b, where A is a kronecker product of matrices. Functions to solve least squares problems of this type are also included.

Implements the largeVis algorithm (see Tang, et al. (2016) <DOI:10.1145/2872427.2883041>) for visualizing very large high-dimensional datasets. Also very fast search for approximate nearest neighbors; outlier detection; and optimized implementations of the HDBSCAN*, DBSCAN and OPTICS clustering algorithms; plotting functions for visualizing the above.

Estimate the mean of a Gaussian vector, by choosing among a large collection of estimators. In particular it solves the problem of variable selection by choosing the best predictor among predictors emanating from different methods as lasso, elastic-net, adaptive lasso, pls, randomForest. Moreover, it can be applied for choosing the tuning parameter in a Gauss-lasso procedure.

Lp_solve is freely available (under LGPL 2) software for solving linear, integer and mixed integer programs. In this implementation we supply a "wrapper" function in C and some R functions that solve general linear/integer problems, assignment problems, and transportation problems. This version calls lp_solve version 5.5.

The lpSolveAPI package provides an R interface to 'lp_solve', a Mixed Integer Linear Programming (MILP) solver with support for pure linear, (mixed) integer/binary, semi-continuous and special ordered sets (SOS) models.

Finds the maximum likelihood estimate of the mean vector and variance-covariance matrix for multivariate normal data with missing values.

Fitting possibly high dimensional penalized regression models. The penalty structure can be any combination of an L1 penalty (lasso and fused lasso), an L2 penalty (ridge) and a positivity constraint on the regression coefficients. The supported regression models are linear, logistic and Poisson regression and the Cox Proportional Hazards model. Cross-validation routines allow optimization of the tuning parameters.

A collection of tools to explore the phylogenetic signal in univariate and multivariate data. The package provides functions to plot traits data against a phylogenetic tree, different measures and tests for the phylogenetic signal, methods to describe where the signal is located and a phylogenetic clustering method.

Performs Penalized Multivariate Analysis: a penalized matrix decomposition, sparse principal components analysis, and sparse canonical correlation analysis, described in the following papers: (1) Witten, Tibshirani and Hastie (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3):515-534. (2) Witten and Tibshirani (2009) Extensions of sparse canonical correlation analysis, with applications to genomic data. Statistical Applications in Genetics and Molecular Biology 8(1): Article 28.

Contains linear and nonlinear regression methods based on Partial Least Squares and Penalization Techniques. Model parameters are selected via cross-validation, and confidence intervals ans tests for the regression coefficients can be conducted via jackknifing.

Method for protein quantification based on identified and quantified peptides. protiq can be used for absolute and relative protein quantification. Input peptide abundance scores can come from various sources, including SRM transition areas and intensities or spectral counts derived from shotgun experiments. The package is still being extended to also include the model for protein identification, MIPGEM, presented in Gerster, S., Qeli, E., Ahrens, C.H. and Buehlmann, P. (2010). Protein and gene model inference based on statistical modeling in k-partite graphs. Proceedings of the National Academy of Sciences 107(27):12101-12106.

Finds the k nearest neighbours for every point in a given dataset in O(N log N) time using Arya and Mount's ANN library (v1.1.3). There is support for approximate as well as exact searches, fixed radius searches and 'bd' as well as 'kd' trees. The distance is computed using the L1 (Manhattan, taxicab) metric. Please see package 'RANN' for the same functionality using the L2 (Euclidean) metric.

Functions to prepare files needed for running BUGS in batch-mode, and running BUGS from R. Support for Linux and Windows systems with OpenBugs is emphasized.

Shrunken Centroids Regularized Discriminant Analysis for the classification purpose in high dimensional data.

Provides functions for linking and de-duplicating data sets. Methods based on a stochastic approach are implemented as well as classification algorithms from the machine learning domain.

Functions to fit Gaussian linear model by maximising the residual log likelihood where the covariance structure can be written as a linear combination of known matrices. Can be used for multivariate models and random effects models. Easy straight forward manner to specify random effects models, including random interactions. Code now optimised to use Sherman Morrison Woodbury identities for matrix inversion in random effects models. We've added the ability to fit models using any kernel as well as a function to return the mean and covariance of random effects conditional on the data (BLUPs).

Constrained clustering, transfer functions, and other methods for analysing Quaternary science data.

An EM algorithm to fit Mallows' Models to full or partial rankings, with or without ties.

Sensitivity indices with dependent correlated inputs, using a method based on PLS regression.

Contains, as a main contribution, a function to fit a regression model with possibly right, left or interval censored observations and with the error distribution expressed as a mixture of G-splines. Core part of the computation is done in compiled C++ written using the Scythe Statistical Library Version 0.3.

The package implements the model-based kernel machine method for detecting gene-centric gene-gene interactions of Li and Cui (2012).

Ten distributions supplementing those built into R. Inverse Gauss, Kruskal-Wallis, Kendall's Tau, Friedman's chi squared, Spearman's rho, maximum F ratio, the Pearson product moment correlation coefficient, Johnson distributions, normal scores and generalized hypergeometric distributions. In addition two random number generators of George Marsaglia are included.

Many approaches for both reading and creating XML (and HTML) documents (including DTDs), both local and accessible via HTTP or FTP. Also offers access to an 'XPath' "interpreter".

This package contains a database of city, state, latitude, and longitude information for U.S. ZIP codes from the CivicSpace Database (August 2004) augmented by Daniel Coven's federalgovernmentzipcodes.us web site (updated January 22, 2012). Previous versions of this package (before 1.0) were based solely on the CivicSpace data, so an original version of the CivicSpace database is also included.