# Martin Maechler

#### 59 packages on CRAN

#### 1 packages on Bioconductor

Modelling with sparse and dense 'Matrix' matrices, using modular prediction and response module classes.

Compute Hartigan's dip test statistic for unimodality / multimodality and provide a test with simulation based p-values, where the original public code has been corrected.

Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) "Finding Groups in Data".

Maximum likelihood estimation of the parameters of a fractionally differenced ARIMA(p,d,q) model (Haslett and Raftery, Appl.Statistics, 1989).

A rich hierarchy of matrix classes, including triangular, symmetric, and diagonal matrices, both dense and sparse and with pattern, logical and numeric entries. Numerous methods for and operations on these matrices, using 'LAPACK' and 'SuiteSparse' libraries.

"Essential" Robust Statistics. Tools allowing to analyze data with robust methods. This includes regression methodology including model selections and multivariate statistics where we strive to cover the book "Robust Statistics, Theory and Methods" by 'Maronna, Martin and Yohai'; Wiley 2006.

Density, Probability and Quantile functions, and random number generation for (skew) stable distributions, using the parametrizations of Nolan.

Useful utilities ['goodies'] from Seminar fuer Statistik ETH Zurich, some of which were ported from S-plus in the 1990's. For graphics, have pretty (Log-scale) axes, an enhanced Tukey-Anscombe plot, combining histogram and boxplot, 2d-residual plots, a 'tachoPlot()', pretty arrows, etc. For robustness, have a robust F test and robust range(). For system support, notably on Linux, provides 'Sys.*()' functions with more access to system and CPU information. Finally, miscellaneous utilities such as simple efficient prime numbers, integer codes, Duplicated(), toLatex.numeric() and is.whole().

Arithmetic (via S4 classes and methods) for arbitrary precision floating point numbers, including transcendental ("special") functions. To this end, the package interfaces to the 'LGPL' licensed 'MPFR' (Multiple Precision Floating-Point Reliable) Library which itself is based on the 'GMP' (GNU Multiple Precision) Library.

Computations for Bessel function for complex, real and partly 'mpfr' (arbitrary precision) numbers; notably interfacing TOMS 644; approximations for large arguments, experiments, etc.

Onedimensional Normal (i.e. Gaussian) Mixture Models Classes, for, e.g., density estimation or clustering algorithms research and teaching; providing the widely used Marron-Wand densities. Efficient random number generation and graphics. Fitting to data by efficient ML (Maximum Likelihood) or traditional EM estimation.

Datasets and Functionality from 'Jan Beran' (1994). Statistics for Long-Memory Processes; Chapman & Hall. Estimation of Hurst (and more) parameters for fractional Gaussian noise, 'fARIMA' and 'FEXP' models.

Implements 'Markovitz' Critical Line Algorithm ('CLA') for classical mean-variance portfolio optimization, see Markovitz (1952) <doi:10.2307/2975974>. Care has been taken for correctness in light of previous buggy implementations.

Construct directed graphs of S4 class hierarchies and visualize them. In general, these graphs typically are DAGs (directed acyclic graphs), often simple trees in practice.

Robustness -- 'eXperimental', 'eXtraneous', or 'eXtraordinary' Functionality for Robust Statistics. In other words, methods which are not yet well established, often related to methods in package 'robustbase'.

Functions, Classes & Methods for estimation, prediction, and simulation (bootstrap) of Variable Length Markov Chain ('VLMC') Models.

Methodology for supervised grouping aka "clustering" of potentially many predictor variables, such as genes etc.

Simple Component Analysis (SCA) often provides much more interpretable components than Principal Components (PCA) while still representing much of the variability in the data.

Qualitatively Constrained (Regression) Smoothing Splines via Linear Programming and Sparse Matrices.

Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, in press.

Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the 'Eigen' C++ library for numerical linear algebra and 'RcppEigen' "glue".

Binning and plotting functions for hexagonal bins. Now uses and relies on grid graphics and formal (S4) classes and methods.

Computes multivariate normal and t probabilities, quantiles, random deviates and densities.

The 'timeDate' class fulfils the conventions of the ISO 8601 standard as well as of the ANSI C and POSIX standards. Beyond these standards it provides the "Financial Center" concept which allows to handle data records collected in different time zones and mix them up to have always the proper time stamps with respect to your personal financial center, or alternatively to the GMT reference time. It can thus also handle time stamps from historical data records from the same time zone, even if the financial centers changed day light saving times at different calendar dates.

A collection of functions to implement a class for univariate polynomial manipulations.

Various R programming tools for plotting data, including: - calculating and plotting locally smoothed summary function as ('bandplot', 'wapply'), - enhanced versions of standard plots ('barplot2', 'boxplot2', 'heatmap.2', 'smartlegend'), - manipulating colors ('col2hex', 'colorpanel', 'redgreen', 'greenred', 'bluered', 'redblue', 'rich.colors'), - calculating and plotting two-dimensional data summaries ('ci2d', 'hist2d'), - enhanced regression diagnostic plots ('lmplot2', 'residplot'), - formula-enabled interface to 'stats::lowess' function ('lowess'), - displaying textual data in plots ('textplot', 'sinkplot'), - plotting a matrix where each cell contains a dot whose size reflects the relative magnitude of the elements ('balloonplot'), - plotting "Venn" diagrams ('venn'), - displaying Open-Office style plots ('ooplot'), - plotting multiple data on same region, with separate axes ('overplot'), - plotting means and confidence intervals ('plotCI', 'plotmeans'), - spacing points in an x-y plot so they don't overlap ('space').

Differential Evolution (DE) stochastic algorithms for global optimization of problems with and without constraints. The aim is to curate a collection of its state-of-the-art variants that (1) do not sacrifice simplicity of design, (2) are essentially tuning-free, and (3) can be efficiently implemented directly in the R language. Currently, it only provides an implementation of the 'jDE' algorithm by Brest et al. (2006) <doi:10.1109/TEVC.2006.872133>.

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'camel style' was consequently applied to functions borrowed from contributed R packages as well.

A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.

Multiple Precision Arithmetic (big integers and rationals, prime number tests, matrix computation), "arithmetic without limitations" using the C library GMP (GNU Multiple Precision Arithmetic).

With this tool, a user should be able to quickly implement complex random effect models through simple C++ templates. The package combines 'CppAD' (C++ automatic differentiation), 'Eigen' (templated matrix-vector library) and 'CHOLMOD' (sparse matrix routines available from R) to obtain an efficient implementation of the applied Laplace approximation with exact derivatives. Key features are: Automatic sparseness detection, parallelism through 'BLAS' and parallel user templates.

Fit linear and generalized linear mixed models with various extensions, including zero-inflation. The models are fitted using maximum likelihood estimation via 'TMB' (Template Model Builder). Random effects are assumed to be Gaussian on the scale of the linear predictor and are integrated out using the Laplace approximation. Gradients are calculated using automatic differentiation.

Estimates previously compiled regression models using the 'rstan' package, which provides the R interface to the Stan C++ library for Bayesian estimation. Users specify models via the customary R syntax with a formula and data.frame plus some additional arguments for priors.

Methods for robust statistics, a state of the art in the early 2000s, notably for robust regression and robust multivariate analysis.

Several cubic spline interpolation methods of H. Akima for irregular and regular gridded data are available through this package, both for the bivariate case (irregular data: ACM 761, regular data: ACM 760) and univariate case (ACM 433 and ACM 697). Linear interpolation of irregular gridded data is also covered by reusing D. J. Renkas triangulation code which is part of Akimas Fortran code. A bilinear interpolator for regular grids was also added for comparison with the bicubic interpolator on regular grids.

Functions to specify and fit generalized nonlinear models, including models with multiplicative interaction terms such as the UNIDIFF model from sociology and the AMMI model from crop science, and many others. Over-parameterized representations of models are used throughout; functions are provided for inference on estimable parameter combinations, as well as standard methods for diagnostics etc.

Functions for causal structure learning and causal inference using graphical models. The main algorithms for causal structure learning are PC (for observational data without hidden variables), FCI and RFCI (for observational data with hidden variables), and GIES (for a mix of data from observational studies (i.e. observational data) and data from experiments involving interventions (i.e. interventional data) without hidden variables). For causal inference the IDA algorithm, the Generalized Backdoor Criterion (GBC), the Generalized Adjustment Criterion (GAC) and some related functions are implemented. Functions for incorporating background knowledge are provided.

Big data statistical analysis for high-dimensional models is made possible by modifying lasso.proj() in 'hdi' package by replacing its nodewise-regression with sparse precision matrix computation using 'BigQUIC'.

Functions to generate plots and tables for comparing independently- sampled populations. Companion package to "A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals" by Wright, Klein, and Wieczorek (2017, in press).

Implements semiparametric transformation model two-phase estimation using calibration weights. The method in Fong and Gilbert (2015) Calibration weighted estimation of semiparametric transformation models for two-phase sampling. Statistics in Medicine <DOI:10.1002/sim.6439>.

Functions for the implementation of Independent Multiple-sample Greedy Equivalence Search (IMaGES), a causal inference algorithm for creating aggregate graphs and structural equation modeling data for one or more datasets. This package is useful for time series data with specific regions of interest. This implementation is inspired by the paper "Six problems for causal inference from fMRI" by Ramsey, Hanson, Hanson, Halchenko, Poldrack, and Glymour (2010) <DOI:10.1016/j.neuroimage.2009.08.065>. The IMaGES algorithm uses a modified BIC score to compute goodness of fit of edge additions, subtractions, and turns across all datasets and returns a representative graph, along with structural equation modeling data for the global graph and individual datasets, means, and standard errors. Functions for plotting the resulting graph(s) are provided. This package is built upon the 'pcalg' package.

Companion package for the book: "Robust Statistics: Theory and Methods, second edition", <http://www.wiley.com/go/maronna/robust>. This package contains code that implements the robust estimators discussed in the recent second edition of the book above, as well as the scripts reproducing all the examples in the book.

Tools for setting up ("design"), conducting, and evaluating large-scale simulation studies with graphics and tables, including parallel computations.

Data sets used for copula modeling in addition to those in the package 'copula'. These include a random subsample from the US National Education Longitudinal Study (NELS) of 1988 and nursing home data from Wisconsin.

Routines and documentation for solving regression problems while imposing an L1 constraint on the estimates, based on the algorithm of Osborne et al. (1998).

Data sets and sample analyses from Pinheiro and Bates, "Mixed-effects Models in S and S-PLUS" (Springer, 2000).

Data and examples from a multilevel modelling software review as well as other well-known data sets from the multilevel modelling literature.

Data sets and sample lmer analyses corresponding to the examples in Littell, Milliken, Stroup and Wolfinger (1996), "SAS System for Mixed Models", SAS Institute.