"clusterPlots"( scores.list, plot.ord = 1:length(scores.list), plot.type = c("heatmap", "line", "by.cluster"), heat.bg.col = "black", summarize = c("mean", "median"), symm.scale = FALSE, cols = NULL, t.name = NULL, verbose = TRUE, ...) "clusterPlots"(scores.list, scale = function(x) x, cap.q = 0.95, cap.type = c("sep", "all"), all.mappable = FALSE, n.clusters = NULL, plot.ord = 1:length(scores.list), expr = NULL, expr.name = NULL, sort.data = NULL, sort.name = NULL, plot.type = c("heatmap", "line", "by.cluster"), summarize = c("mean", "median"), cols = NULL, t.name = NULL, verbose = TRUE, ...)
"sep"
, then the cap quantile is calculated and applied
to each coverage matrix separately. If "all"
, then one cap
quantile is calculated based on all of the matrices combined.sort.data
plot.plot
when heatmap plot is
drawn, that influence how the points of the expression and sort data plots
will look. If the lineplot is being drawn, parameters to influence
the line styles.ScoresList
, then a ClusteredScoresList
is
returned. If called with a ClusteredScoresList
, then nothing is returned.
ClusteredScoresList
should be created by the user, if they wish to do
some custom clustering and normalisation on the coverage matrices. Otherwise, if
the user is happy with k-means or PAM clustering, then the ScoresList
object as
output by featureScores()
can be directly used. If called with a ScoresList
,
then the matrices for each coverage type are joined. Then the function supplied by
the scale
argument is used to scale the data. Next, each matrix is capped.
Then each matrix is divided by its maximum value, so that the Euclidean distance
between maximum reads and no reads is the same for each matrix. Lastly, either k-means
or PAM clustering is performed to get the cluster membership of each feature. If there are any
NAs in the scores, then PAM will be used. Otherwise, k-means is used for speed. Then, a
ClusteredScoresList
object is created, and used. The clusters are
guaranteed to be given IDs in descending order of summarised cluster expression, if it
is provided. If called with a ClusteredScoresList
, no scaling or capping
is done, so it is the user's responsibility to normalise the coverage matrices as
they see fit, when creating the ClusteredScoresList
object. If a ClusteredScoresList
object is subsetted, the original data range is
saved in a private slot, so that if the user wants to plot a subset of the features,
such as a certain cluster, for example, the intensity range of the heatmap,
or the y-axis range of the lineplot will be the same as before subsetting.
If expression data is given, the summarised expression level of each cluster is calculated, and the clusters are plotted in order of decreasing expression, down the page. Otherwise, they are plotted in ascending order of cluster ID. If a heatmap plot is being drawn, then a heatmap is drawn for every coverage matrix, side-by-side, and a plot of each feature's expression is put alongside the heatmaps, if provided. If additional sort vector was given, the data within clusters are sorted on this vector, then a plot of this data is made as the rightmost graph.
The lineplot style is similar to the heatmap plot, but clusters are summarised. A grid, with as many rows as there are clusters, and as many columns as there are clusters is made, and lineplots showing the summarised scores are made in the grid. Beside the grid, a boxplot of expression is drawn for each cluster, if provided.
For a cluster-wise lineplot, a graph is drawn for each cluster, with the colours
being the different coverage types. Because it makes sense that there will be more
clusters than there are types of coverage (typically double to triple the number),
the plots are not drawn side-by-side, as is the layout for the heatmaps. For this
reason, sending the output to a PDF device is necessary. It is recommended to make
the width of the PDF device wider than the default. Since the coverage data between
different marks is not comparable, this method is inappropriate for visualising a
ClusteredScoresList
object if it was created by the clusterPlots scoresList
method. If the user, however, can come up with a normalisation method to account
for the differences that are apparent between different types (i.e. peaked vs.
spread) of marks that makes the coverages meaningfully comparable, they can alter
the tables, do their own clustering, and create a ClusteredScoresList
object with the modified tables.
featureScores
for generating coverage matrices. data(samplesList) # Loads 'samples.list.subset'.
data(expr) # Loads 'expr.subset'.
data(chr21genes)
fs <- featureScores(samples.list.subset[1:2], chr21genes, up = 2000, down = 1000,
freq = 500, s.width = 500)
clusterPlots(fs, function(x) sqrt(x), n.clusters = 5, expr = as.numeric(expr.subset),
plot.type = "heatmap", pch = 19, cex = 0.5)
Run the code above in your browser using DataLab