# Matthias Kohl

#### 27 packages on CRAN

#### 3 packages on Bioconductor

Performance measures and scores for statistical classification such as accuracy, sensitivity, specificity, recall, similarity coefficients, AUC, GINI index, Brier score and many more. Calculation of optimal cut-offs and decision stumps (Iba and Langley (1991), <doi:10.1016/B978-1-55860-247-2.50035-8>) for all implemented performance measures. Hosmer-Lemeshow goodness of fit tests (Lemeshow and Hosmer (1982), <doi:10.1093/oxfordjournals.aje.a113284>; Hosmer et al (1997), <doi:10.1002/(SICI)1097-0258(19970515)16:9%3C965::AID-SIM509%3E3.0.CO;2-O>). Statistical and epidemiological risk measures such as relative risk, odds ratio, number needed to treat (Porta (2014), <doi:10.1093%2Facref%2F9780199976720.001.0001>).

Computation of standardized interquartile range (IQR), Huber-type skipped mean (Hampel (1985), <doi:10.2307/1268758>), robust coefficient of variation (CV) (Arachchige et al. (2019), <arXiv:1907.01110>), robust signal to noise ratio (SNR), z-score, standardized mean difference (SMD), as well as functions that support graphical visualization such as boxplots based on quartiles (not hinges), negative logarithms and generalized logarithms for 'ggplot2' (Wickham (2016), ISBN:978-3-319-24277-4).

Computation of various confidence intervals (Altman et al. (2000), ISBN:978-0-727-91375-3; Hedderich and Sachs (2018), ISBN:978-3-662-56657-2) including bootstrapped versions (Davison and Hinkley (1997), ISBN:978-0-511-80284-3) as well as Hsu (Hedderich and Sachs (2018), ISBN:978-3-662-56657-2), permutation (Janssen (1997), <doi:10.1016/S0167-7152(97)00043-6>), bootstrap (Davison and Hinkley (1997), ISBN:978-0-511-80284-3) and multiple imputation (Barnard and Rubin (1999), <doi:10.1093/biomet/86.4.948>) t-test. Graphical visualization by volcano plots.

Contains several functions for statistical data analysis; e.g. for sample size and power calculations, computation of confidence intervals and tests, and generation of similarity matrices.

Similarity plots based on correlation and median absolute deviation (MAD); adjusting colors for heatmaps; aggregate technical replicates; calculate pairwise fold-changes and log fold-changes; compute one- and two-way ANOVA; simplified interface to package 'limma' (Ritchie et al. (2015), <doi:10.1093/nar/gkv007> ) for moderated t-test and one-way ANOVA; Hamming and Levenshtein (edit) distance of strings as well as optimal alignment scores for global (Needleman-Wunsch) and local (Smith-Waterman) alignments with constant gap penalties (Merkl and Waack (2009), ISBN:978-3-527-32594-8).

Power analysis and sample size calculation for Welch and Hsu (Hedderich and Sachs (2018), ISBN:978-3-662-56657-2) t-tests including Monte-Carlo simulations of empirical power and type-I-error. Power and sample size calculation for Wilcoxon rank sum and signed rank tests via Monte-Carlo simulations. Power and sample size required for the evaluation of a diagnostic test(-system) (Flahault et al. (2005), <doi:10.1016/j.jclinepi.2004.12.009>; Dobbin and Simon (2007), <doi:10.1093/biostatistics/kxj036>) as well as for a single proportion (Fleiss et al. (2003), ISBN:978-0-471-52629-2) and comparing two negative binomial rates (Zhu and Lakkis (2014), <doi:10.1002/sim.5947>).

RFLPtools provides functions to analyse DNA fragment samples (i.e. derived from RFLP-analysis) and standalone BLAST report files (i.e. DNA sequence analysis).

Functions for the determination of optimally robust influence curves and estimators in case of normal location and/or scale.

Functions for the determination of optimally robust influence curves and estimators for preprocessing omics data, in particular gene expression data.

Functions for the determination of optimally robust influence curves in case of linear regression with unknown scale and standard normal distributed errors where the regressor is random.

Optimally robust estimation in general smoothly parameterized models using S4 classes and methods.

Optimally robust estimation using S4 classes and methods. Old version still needed for current versions of ROptRegTS and RobRex.

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

An R6 object oriented distributions package. Unified interface for 42 probability distributions and 11 kernels including functionality for multiple scientific types. Additionally functionality for composite distributions and numerical imputation. Design patterns including wrappers and decorators are described in Gamma et al. (1994, ISBN:0-201-63361-2). For quick reference of probability distributions including d/p/q/r functions and results we refer to McLaughlin, M. P. (2001). Additionally Devroye (1986, ISBN:0-387-96305-7) for sampling the Dirichlet distribution, Gentle (2009) <doi:10.1007/978-0-387-98144-4> for sampling the Multivariate Normal distribution and Michael et al. (1976) <doi:10.2307/2683801> for sampling the Wald distribution.

Provides documentation in form of a common vignette to packages 'distr', 'distrEx', 'distrMod', 'distrSim', 'distrTEst', 'distrTeach', and 'distrEllipse'.

S4-classes for setting up a coherent framework for simulation within the distr family of packages.

Provides flexible examples of LLN and CLT for teaching purposes in secondary school.

Evaluation (S4-)classes based on package distr for evaluating procedures (estimators/tests) at data/simulation in a unified way.

Functions for calculating sample size and power for clinical trials with multiple (co-)primary endpoints.

Includes 'sysdata.rda' file for packages of the 'RobASt' - family of packages; is currently used by package 'RobExtremes' only.

Optimally robust estimation for extreme value distributions using S4 classes and methods (based on packages 'distr', 'distrEx', 'distrMod', 'RobAStBase', and 'ROptEst').

Functions for the selection of optimal reference genes and the normalisation of real-time quantitative PCR data.

The package provides functions to read raw RT-qPCR data of different platforms.