cormat_filt
splits (cuts) the dendrogram at a given threshold dividing it into larger or
smaller "sub-clusters". Correlation P-Values (see eset_cor
) are converted to represent
significance as a sub-cluster-wise signal metric used for filtering. Optionally, up to 3 plots are produced,
the third one being a filtered heatmap based on significance and three height cutting.
cormap_filt(
x,
na.frac = 0.1,
method = "ward.D",
do.abs = TRUE,
main = "correlation map",
postfix = NULL,
p.thr = 0.01,
cex = 0.2,
cex.clust = cex,
cex.filt = cex,
cut.thr = NULL,
cor.thr = NULL,
cor.cluster = 1,
cor.window = NULL,
do.plots = c("dend", "full.heat", "filt.heat"),
genes2highl = NULL,
order.list = TRUE,
convert = TRUE,
biomart = FALSE,
add.sig = FALSE,
verbose = FALSE
)
(ExpressionSet
, data.frame
or numeric
). A numeric data frame, matrix or an ExpressionSet object.
(numeric
). Fraction of missing values allowed per row of the input matrix. Defaults to 0.1
which
means LESS than 10 per cent of the values in one row are allowed to be NAs.
(character
). The agglomeration method used for clustering. See help for hclust
.
Defaults to "ward.D".
(logical
). Should the distances for clustering be calculated based on the absolute correlation values?
In other words, should the sign of the correlation be ignored in favor of its strength?
(character
). The main title of the plot. Defaults to ""
.
(character
of logical
). A plot sub-title. Will be printed below the main title. Defaults to NULL
.
(numeric
). P-Value threshold for filtering sub-clusterd with significant correlations. Defaults to 0.01
.
(numeric
). Font size for the heatmap of the unfiltered correlation matrix. Defaults to 0.2
.
(numeric
). Font size for the dendrogram plot of the unfiltered correlation matrix clusters.
Defaults to cex
.
(numeric
). Font size for the heatmap of the filtered correlation matrix. Defaults to cex
.
(numeric
). Threshold at which dendrogram branches are to be cut. Passed on to argument h
in
cut.dendrogram
. Defaults to NULL
meaning no cutting.
(numeric
). Correlation threshold to filter the correlation matrix for plotting. Defaults to NULL
meaning
no filtering. Note that this value will be applied to margin cor.mar of the values per row.
(numeric
). The correlation cluster along the diagonal 'line' in the heatmap that should be
zoomed into. A sliding window of size cor.window
will be moved along the diagonal of the correlation
matrix to find the cluster with the most corelation values meeting core.thr
. Defaults to 1
.
(numeric
). The size of the sliding window (see cor.cluster
). Defaults to NULL
.
Note that this works only for positive correlations.
(character
). The plots to be produced. A character vector containing one or more of "dend"
to produce the dendrogram plot, "full.heat" to produce the heatmap of the unfiltered correlation matrix, and
"filt.heat" to produce the heatmap of the filtered correlation matrix. Defaults to all three plots.
(character
). Vector of gene symbols (or whatever labels are used) to be highlighted.
If not NULL
will draw a semi-transparent rectangle around the labels and rows or columns in the heatmap
labels.
(logical
). Should the order of the correlation matrix, i.e. the 'list' of labels be reversed?
Meaningful if the order of input variables should be preserved because image
turns the input
matrix. Defaults to TRUE
.
(logical
). Should an attempt be made to convert IDs provided as row names of the input or in lab?
Defaults to TRUE
. Conversion will be done using BioMart or an annotation package, depending on biomart.
(logical
). Should BioMart (or an annotation package) be used to convert IDs? If TRUE
the todisp2
function in package convertid
attempts to access the BioMart API to convert ENSG IDs to Gene Symbols
Defaults to FALSE
which will use the traditional AnnotationDbi
Bimap interface.
(logical
). Should significance asterisks be drawn? If TRUE
P-Values for correlation significance
are calculated and encoded as asterisks. See 'Details'.
(logical
). Should verbose output be written to the console? Defaults to FALSE
.
A list
. If the dendrogram is being cut, i.e., cut.thr
is not NULL
, a list of
clusters: the list of cluster labels from lower component of the cut.dendrogram output which
is list with the branches obtained from cutting the tree |
|
filt: the index of the cluster labels passing the signal metrics threshold | |
filt_cluster: the list of the filtered cluster labels | |
h: the cut threshold | |
p.thr: the P-Value threshold for filtering sub-clusters | |
metric: the signal metrics for all sub-clusters | |
cormat: the clustered (ordered) correlation matrix | |
hclust: a list of hierarchical clustering metrics (output of hclust ) |
|
pvalues: the correlation P-Value matrix |
If no tree cutting is applied, a list of
cormat: the clustered (ordered) correlation matrix | |
hclust: a list of hierarchical clustering metrics (output of hclust ) |
|
pvalues: the correlation P-Value matrix |
P-Values are calculated from the t-test value of the correlation coefficient: \(t = r x sqrt(n-2) / sqrt(1-r^2)\),
where r is the correlation coefficient, n is the number of samples with no missing values for each gene (row-wise
ncol(eset)
minus the number of columns that have an NA). P-Values are then calculated using pt
and
corrected account for the two-tailed nature of the test, i.e., the possibility of positive as well as negative correlation.
The approach to calculate correlation significance was adopted from Miles, J., & Banyard, P. (2007) on
"Calculating the exact significance of a Pearson correlation in MS Excel".
To obtain a suitable metric for isolating significant sub-clusters, P-Values are represented as \(-log10(median(pval))\)
where pval
is the median of the parallel maximum of all P-Values belonging to the sub-cluster and
1e-38
to avoid values of zero (0).