nroAggregate: Regional averages on a self-organizing map

Description

Estimate district averages based on assigned map locations for each data point.

Usage

nroAggregate(topology, districts, data = NULL)

Arguments

topology

A data frame with K rows and six columns, see details.

districts

An integer vector of M best-matching districts.

data

A vector of M elements or an M x N matrix of data values.

Value

If the input argument data is empty, the histogram of the data points on the map is returned (a K x 1 vector of estimated counts after smoothing).

If data are available, a data frame of K rows and N columns that contains the average district values after smoothing is returned. The data frame has the attribute "histogram" that contains data point counts over each data column. Column names and the attribute "binary" are copied from the input data.

If the output is a single column, it is converted to a vector.

Details

Topology can be either the output from nroKohonen() or a data frame in the same format as the element topology within the aforementioned output list.

The input argument districts is expected to be the output from nroMatch().

References

Gao S, Mutter S, Casey AE, M<U+00E4>kinen V-P (2018) Numero: a statistical framework to define multivariable subgroups in complex population-based datasets, Int J Epidemiology, https://doi.org/10.1093/ije/dyy113

Examples

Run this code

# NOT RUN {
# Import data.
fname <- system.file("extdata", "finndiane.txt", package = "Numero")
dataset <- read.delim(file = fname)

# Prepare training data.
trvars <- c("CHOL", "HDL2C", "TG", "CREAT", "uALB")
trdata <- scale.default(dataset[,trvars]) 

# K-means clustering.
km <- nroKmeans(data = trdata)

# Self-organizing map.
sm <- nroKohonen(seeds = km)
sm <- nroTrain(som = sm, data = trdata)

# Assign data points into districts.
matches <- nroMatch(centroids = sm, data = trdata)

# District averages for one variable.
chol <- nroAggregate(topology = sm, districts = matches,
                     data = dataset$CHOL)
print(chol)

# District averages for all variables.
planes <- nroAggregate(topology = sm, districts = matches, data = dataset)
print(head(planes))
# }

Run the code above in your browser using DataLab