mix.heatmap: Heatmap for data with variables of mixed types

Description

Produces a heatmap visualizing all samples and variables of a dataset. Both samples and variables are clustered using methods suitable for mixed-type data. Different types of variables are indicated by different color schemes.

Usage

mix.heatmap(data, D.subjects, D.variables, dend.subjects, dend.variables, varweights,
dist.variables.method = c("associationMeasures", "distcor", "ClustOfVar"), 
linkage="ward.D2", associationFun = association, rowlab, rowmar = 3, lab.cex = 1.5, 
ColSideColors, RowSideColors, 
col.cont = marray::maPalette(low = "lightblue", high = "darkblue", k = 50), 
cont.fixed.range = FALSE, cont.range,
col.ord = list(low = "lightgreen", high = "darkgreen"), 
col.cat = c("indianred1","darkred","orangered","orange","palevioletred1",
          "violetred4","red3","indianred4"), 
legend.colbar, legend.rowbar, legend.mat = FALSE, legend.cex = 1, legend.srt = 0)

Arguments

data

data frame where columns are variables (of different data types) and rows are observations (subjects, samples)

D.subjects

A previously calculated distance matrix (class dissimilarity) for subjects can be given. If missing, it is calculated by dist.subjects. If set to NULL, no clustering is done and original order in data will be preserved.

D.variables

A previously calculated distance matrix (of class dissimilarity) for variables can be given. If missing, it is calculated by dist.variables. If set to NULL, no clustering is done and original order in data will be preserved.

dend.subjects

A dendrogram for subjects can be given; then no distances between subjects will be calculated and D.subjects will be ignored.

dend.variables

A dendrogram for variables can be given; then no distances between variables will be calculated and D.variables will be ignored.

varweights

optional vector of variable weights, used for calculating Gower's distances between subjects

dist.variables.method

If "associationMeasures", similarities between variables are assessed by combination of appropriate measures of association for different pairs of data types. If "distcor", distances between variables are calculated based on distance correlation. In both cases, then a dendrogram is derived by standard hierarchical clustering (hclust). If "ClustOfVar", variables are clustered by hclustvar from the ClustOfVar package.

linkage

agglomeration method used for hierarchical clustering; corresponds to parameter method of hclust

associationFun

By default, appropriate association measures are chosen for each pair of variables, see association for details. But the user can also define a function that for any two variables calculates a similarity measure. Ignored if dist.variables.method = "ClustOfVar" or "distcor"

rowlab

row (variable) labels; if missing, column names of data are used

rowmar

margin for row (variable) labels

lab.cex

size of row (variable) labels

ColSideColors

vector of length nrow(data) specifying colors for a color bar added on top of the heatmap

RowSideColors

vector of length ncol(data) specifying colors for a color bar added to the left of the heatmap

col.cont

color palette for continuous variables; defaults to red-blue color palette

cont.fixed.range

If FALSE, color range of each continuous variable is defined by respective individual variable's range. If TRUE, all continuous variables are assumed to have similar range and hence shall have the same color range; "extreme colors" then correspond to extreme values over all continuous variables and are applied to all of them equally. In any case, in order to prevent outlier values to dominate the color scale, "extreme colors" are restricted to 2.5% and 97.5% quantiles. Defaults to FALSE

cont.range

if cont.fixed.range=TRUE, extreme value limits for coloring continuous variables can be specified; if missing, extreme values are taken from the data; ignored if cont.fixed.range=FALSE

col.ord

List with names of colors for the lowest and highest categories of ordinal variables. A color palette will be created correspondingly based on the number of categories. Defaults to a green color palette

col.cat

vector of colors for categorical variables

legend.colbar

class labels for subject groups defined by ColSideColors

legend.rowbar

class labels for variable groups defined by RowSideColors

legend.mat

shall legend matrix for heatmap be shown?

legend.cex

size of legend text

legend.srt

legend matrix label string rotation in degrees; i.e. legend.srt = 90 produces vertical labels

Value

A mixed-data heatmap with dendrograms and annotation

Details

If no dendrograms or distance matrices are given, subjects and/or samples are clustered with methods for mixed-type data. Similarities between subjects are measured by Gower's general similarity coefficient with an extension of Podani for ordinal variables, see gowdis. Similarities between variables can be assessed by combination of appropriate measures of association for different pairs of data types, see association, or based on distance correlation. Then standard hierarchical clustering with by default Ward's minimum variance method is applied. Alternatively, variables can also be clustered by the ClustOfVar approach.

Variables are shown as rows of the heatmap, samples as columns.

References

Hummel M, Edelmann D, Kopp-Schneider A (2017). Clustering of samples and variables with mixed-type data. PLOS ONE, 12(11):e0188274.

Gower J (1971). A general coefficient of similarity and some of its properties. Biometrics, 27:857-871.

Chavent M, Kuentz-Simonet V, Liquet B, Saracco J (2012). ClustOfVar: An R Package for the Clustering of Variables. Journal of Statistical Software, 50:1-16.

Szekely GJ, Rizzo ML, Bakirov NK (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35.6:2769-2794.

Lyons R (2013). Distance covariance in metric spaces. The Annals of Probability, 41.5:3284-3305.

Examples

Run this code

# NOT RUN {
data(mixdata)

mix.heatmap(mixdata, rowmar=7, legend.mat=TRUE)

## with distance correlation
mix.heatmap(mixdata, dist.variables.method="distcor", rowmar=7, legend.mat=TRUE)

## with (random) color bars 
colbar <- rep(5:7, nrow(mixdata))
rowbar <- rep(c("darkorange","grey"), ncol(mixdata))
mix.heatmap(mixdata, ColSideColors=colbar, RowSideColors=rowbar, 
legend.colbar=c("1","2","3"), legend.rowbar=c("a","b"), rowmar=7)

## example with variable weights
w <- rep(1:2, each=5)
mix.heatmap(mixdata, varweights=w, rowmar=7)
# }

Run the code above in your browser using DataLab