# Numero v1.2.0

Monthly downloads

## Statistical Framework to Define Subgroups in Complex Datasets

High-dimensional datasets that do not exhibit a clear intrinsic clustered structure pose a challenge to conventional clustering algorithms. For this reason, we developed an unsupervised framework that helps scientists to better subgroup their datasets based on visual cues, please see Gao S, Mutter S, Casey A, Makinen V-P (2018) Numero: a statistical framework to define multivariable subgroups in complex population-based datasets, Int J Epidemiology, dyy113, <doi:10.1093/ije/dyy113>. The framework includes the necessary functions to construct a self-organizing map of the data, to evaluate the statistical significance of the observed data patterns, and to visualize the results.

## Readme

# Numero

## Overview

In textbook examples, multivariable datasets are clustered into distinct subgroups that can be clearly identified by a set of optimal mathematical criteria. However, many real-world datasets arise from synergistic consequences of multiple effects, noisy and partly redundant measurements, and may represent a continuous spectrum of the different phases of a phenomenon. In medicine, complex diseases associated with ageing are typical examples. We postulate that population-based biomedical datasets (and many other real-world examples) do not contain an intrinsic clustered structure that would give rise to mathematically well-defined subgroups. From a modeling point of view, the lack of intrinsic structure means that the data points inhabit a contiguous cloud in high-dimensional space without abrupt changes in density to indicate subgroup boundaries, hence a mathematical criteria cannot segment the cloud reliably by its internal structure. Yet we need data-driven classification and subgrouping to aid decision-making and to facilitate the development of testable hypotheses. For this reason, we developed the Numero package, a more flexible and transparent process that allows human observers to create usable multivariable subgroups even when conventional clustering frameworks struggle.

## Installation

```
# Install Numero from the CRAN repository:
install.packages("Numero")
```

## Usage

The vignette of the package contains a practical real-life example of how to use the Numero R functions to define subgroups within a biomedical dataset.

```
library(Numero)
browseVignettes(package = "Numero")
```

## Functions in Numero

Name | Description | |

nroPreprocess | Data cleaning and standardization | |

nroTrain | Train self-organizing map | |

numero.clean | Clean datasets | |

nroPermute | Permutation analysis of map layout | |

nroPlot | Plot a self-organizing map | |

numero.summary | Summarize subgroup statistics | |

nroPostprocess | Standardization using existing parameters | |

numero.evaluate | Self-organizing map statistics | |

numero.create | Create a self-organizing map | |

numero.quality | Self-organizing map statistics | |

numero.subgroup | Interactive subgroup assignment | |

nroSummary | Estimate subgroup statistics | |

numero.plot | Plot results from SOM analysis | |

nroRcppMatrix | Safety check for Rcpp calls | |

numero.prepare | Prepare datasets for analysis | |

nroPrune | Reduce collinearity within a dataset | |

nroLabel | Label pruning | |

nroDestratify | Mitigate data stratification | |

nroPair | Match similar rows | |

nroKmeans | K-means clustering | |

nroAggregate | Regional averages on a self-organizing map | |

nroMatch | Best-matching districts | |

nroKohonen | Self-organizing map | |

nroImpute | Impute missing values | |

nroColorize | Assign colors based on value | |

No Results! |

## Vignettes of Numero

Name | ||

intro.rmd | ||

No Results! |

## Last month downloads

## Details

Type | Package |

Date | 2019-06-12 |

License | GPL (>= 2) |

LinkingTo | Rcpp |

VignetteBuilder | knitr |

NeedsCompilation | yes |

Repository | CRAN |

SystemRequirements | C++11 |

Encoding | UTF-8 |

LazyData | true |

Packaged | 2019-06-12 05:19:50 UTC; vipmak |

Date/Publication | 2019-06-12 13:30:08 UTC |

suggests | knitr , rmarkdown |

imports | Rcpp (>= 0.11.4) |

Contributors | Ville-Petteri Makinen, Song Gao, Stefan Mutter, Aaron Casey |

#### Include our badge in your README

```
[![Rdoc](http://www.rdocumentation.org/badges/version/Numero)](http://www.rdocumentation.org/packages/Numero)
```