Numero (version 1.2.0)

nroLabel: Label pruning

Description

Optimize the selection of labels on map districts.

Usage

nroLabel(topology, values, gap = 2.3)

Arguments

topology

A data frame with K rows and six columns, see details.

values

A vector of K values or a K x N data frame, where K is the number of map districts and N is the number of variables.

gap

Minimum distance between map districts with non-empty labels.

Value

A data frame with K rows and N columns that contains labels for the map districts for each of the columns in values.

Details

The function assigns non-empty labels for districts based on the absolute deviations from the average district value. The most extreme districts are picked first, and then the remaining districts are prioritized based on their value and distance to the other districts already labeled. Columns that are listed in the attribute 'binary' in 'values' are given percentage labels.

Topology can be either the output from nroKohonen() or a data frame in the same format as the element topology within the aforementioned output list.

References

Gao S, Mutter S, Casey AE, M<U+00E4>kinen V-P (2018) Numero: a statistical framework to define multivariable subgroups in complex population-based datasets, Int J Epidemiology, https://doi.org/10.1093/ije/dyy113

Examples

Run this code
# NOT RUN {
# Import data.
fname <- system.file("extdata", "finndiane.txt", package = "Numero")
dataset <- read.delim(file = fname)

# Prepare training data.
trvars <- c("CHOL", "HDL2C", "TG", "CREAT", "uALB")
trdata <- scale.default(dataset[,trvars]) 

# K-means clustering.
km <- nroKmeans(data = trdata)

# Self-organizing map.
sm <- nroKohonen(seeds = km)
sm <- nroTrain(som = sm, data = trdata)

# Assign data points into districts.
matches <- nroMatch(centroids = sm, data = trdata)

# District averages for all variables.
planes <- nroAggregate(topology = sm, districts = matches, data = dataset)

# District labels for cholesterol.
chol <- nroLabel(topology = sm, values = planes$CHOL)
print(head(chol))

# District labels for all variables.
colrs <- nroLabel(topology = sm, values = planes)
print(head(colrs))
# }

Run the code above in your browser using DataCamp Workspace