# Numero v1.2.0

0

0th

Percentile

## Statistical Framework to Define Subgroups in Complex Datasets

High-dimensional datasets that do not exhibit a clear intrinsic clustered structure pose a challenge to conventional clustering algorithms. For this reason, we developed an unsupervised framework that helps scientists to better subgroup their datasets based on visual cues, please see Gao S, Mutter S, Casey A, Makinen V-P (2018) Numero: a statistical framework to define multivariable subgroups in complex population-based datasets, Int J Epidemiology, dyy113, <doi:10.1093/ije/dyy113>. The framework includes the necessary functions to construct a self-organizing map of the data, to evaluate the statistical significance of the observed data patterns, and to visualize the results.

# Numero

## Overview

In textbook examples, multivariable datasets are clustered into distinct subgroups that can be clearly identified by a set of optimal mathematical criteria. However, many real-world datasets arise from synergistic consequences of multiple effects, noisy and partly redundant measurements, and may represent a continuous spectrum of the different phases of a phenomenon. In medicine, complex diseases associated with ageing are typical examples. We postulate that population-based biomedical datasets (and many other real-world examples) do not contain an intrinsic clustered structure that would give rise to mathematically well-defined subgroups. From a modeling point of view, the lack of intrinsic structure means that the data points inhabit a contiguous cloud in high-dimensional space without abrupt changes in density to indicate subgroup boundaries, hence a mathematical criteria cannot segment the cloud reliably by its internal structure. Yet we need data-driven classification and subgrouping to aid decision-making and to facilitate the development of testable hypotheses. For this reason, we developed the Numero package, a more flexible and transparent process that allows human observers to create usable multivariable subgroups even when conventional clustering frameworks struggle.

## Installation

# Install Numero from the CRAN repository:
install.packages("Numero")


## Usage

The vignette of the package contains a practical real-life example of how to use the Numero R functions to define subgroups within a biomedical dataset.

library(Numero)
browseVignettes(package = "Numero")


## Functions in Numero

 Name Description nroPreprocess Data cleaning and standardization nroTrain Train self-organizing map numero.clean Clean datasets nroPermute Permutation analysis of map layout nroPlot Plot a self-organizing map numero.summary Summarize subgroup statistics nroPostprocess Standardization using existing parameters numero.evaluate Self-organizing map statistics numero.create Create a self-organizing map numero.quality Self-organizing map statistics numero.subgroup Interactive subgroup assignment nroSummary Estimate subgroup statistics numero.plot Plot results from SOM analysis nroRcppMatrix Safety check for Rcpp calls numero.prepare Prepare datasets for analysis nroPrune Reduce collinearity within a dataset nroLabel Label pruning nroDestratify Mitigate data stratification nroPair Match similar rows nroKmeans K-means clustering nroAggregate Regional averages on a self-organizing map nroMatch Best-matching districts nroKohonen Self-organizing map nroImpute Impute missing values nroColorize Assign colors based on value No Results!

## Details

 Type Package Date 2019-06-12 License GPL (>= 2) LinkingTo Rcpp VignetteBuilder knitr NeedsCompilation yes Repository CRAN SystemRequirements C++11 Encoding UTF-8 LazyData true Packaged 2019-06-12 05:19:50 UTC; vipmak Date/Publication 2019-06-12 13:30:08 UTC
 suggests knitr , rmarkdown imports Rcpp (>= 0.11.4) Contributors Ville-Petteri Makinen, Song Gao, Stefan Mutter, Aaron Casey