clusterPlots: Visualisation of tables of feature coverages.

Description

Takes the output of featureScores, or a modified version of it, and plots a heatmaps or lineplots representation of clustered coverages.

Usage

"clusterPlots"( scores.list, plot.ord = 1:length(scores.list), plot.type = c("heatmap", "line", "by.cluster"), heat.bg.col = "black", summarize = c("mean", "median"), symm.scale = FALSE, cols = NULL, t.name = NULL, verbose = TRUE, ...) "clusterPlots"(scores.list, scale = function(x) x, cap.q = 0.95, cap.type = c("sep", "all"), all.mappable = FALSE, n.clusters = NULL, plot.ord = 1:length(scores.list), expr = NULL, expr.name = NULL, sort.data = NULL, sort.name = NULL, plot.type = c("heatmap", "line", "by.cluster"), summarize = c("mean", "median"), cols = NULL, t.name = NULL, verbose = TRUE, ...)

Arguments

scores.list

A ScoresList or ClusteredScoresList object.

scale

A function to scale all the coverages by. Default : No scaling.

cap.q

The quantile of coverages above which to make any bigger coverages equal to the quantile.

cap.type

If "sep", then the cap quantile is calculated and applied to each coverage matrix separately. If "all", then one cap quantile is calculated based on all of the matrices combined.

all.mappable

If TRUE, then only features with all measurements not NA will be used.

n.clusters

Number of clusters to find in the coverage data. Required.

plot.ord

Order of the experiment types to plot.

expr

A vector of expression values.

expr.name

A label, describing the expression data.

sort.data

A vector of values to sort the features within a cluster on.

sort.name

Label to place under the sort.data plot.

plot.type

Style of plot to draw.

heat.bg.col

If a heatmap is being drawn, the background colour to plot NA values with.

summarize

How to summarise the score columns of each cluster. Not relevant for heatmap plot.

symm.scale

Whether to make lineplot y-axis or heatmap intensity centred around 0. By default, all plots are not symmetrically ranged.

cols

The colours to use for the lines in the lineplot or intensities in the heatmap.

t.name

Title to use above all the heatmaps or lineplots. Ignored when cluster-wise lineplots are drawn.

verbose

Whether to print the progress of processing.

...

Further graphical paramters passed to plot when heatmap plot is drawn, that influence how the points of the expression and sort data plots will look. If the lineplot is being drawn, parameters to influence the line styles.

Value

If called with a ScoresList, then a ClusteredScoresList is returned. If called with a ClusteredScoresList, then nothing is returned.

Details

A ClusteredScoresList should be created by the user, if they wish to do some custom clustering and normalisation on the coverage matrices. Otherwise, if the user is happy with k-means or PAM clustering, then the ScoresList object as output by featureScores() can be directly used. If called with a ScoresList, then the matrices for each coverage type are joined. Then the function supplied by the scale argument is used to scale the data. Next, each matrix is capped. Then each matrix is divided by its maximum value, so that the Euclidean distance between maximum reads and no reads is the same for each matrix. Lastly, either k-means or PAM clustering is performed to get the cluster membership of each feature. If there are any NAs in the scores, then PAM will be used. Otherwise, k-means is used for speed. Then, a ClusteredScoresList object is created, and used. The clusters are guaranteed to be given IDs in descending order of summarised cluster expression, if it is provided. If called with a ClusteredScoresList, no scaling or capping is done, so it is the user's responsibility to normalise the coverage matrices as they see fit, when creating the ClusteredScoresList object.

If a ClusteredScoresList object is subsetted, the original data range is saved in a private slot, so that if the user wants to plot a subset of the features, such as a certain cluster, for example, the intensity range of the heatmap, or the y-axis range of the lineplot will be the same as before subsetting.

If expression data is given, the summarised expression level of each cluster is calculated, and the clusters are plotted in order of decreasing expression, down the page. Otherwise, they are plotted in ascending order of cluster ID. If a heatmap plot is being drawn, then a heatmap is drawn for every coverage matrix, side-by-side, and a plot of each feature's expression is put alongside the heatmaps, if provided. If additional sort vector was given, the data within clusters are sorted on this vector, then a plot of this data is made as the rightmost graph.

The lineplot style is similar to the heatmap plot, but clusters are summarised. A grid, with as many rows as there are clusters, and as many columns as there are clusters is made, and lineplots showing the summarised scores are made in the grid. Beside the grid, a boxplot of expression is drawn for each cluster, if provided.

For a cluster-wise lineplot, a graph is drawn for each cluster, with the colours being the different coverage types. Because it makes sense that there will be more clusters than there are types of coverage (typically double to triple the number), the plots are not drawn side-by-side, as is the layout for the heatmaps. For this reason, sending the output to a PDF device is necessary. It is recommended to make the width of the PDF device wider than the default. Since the coverage data between different marks is not comparable, this method is inappropriate for visualising a ClusteredScoresList object if it was created by the clusterPlots scoresList method. If the user, however, can come up with a normalisation method to account for the differences that are apparent between different types (i.e. peaked vs. spread) of marks that makes the coverages meaningfully comparable, they can alter the tables, do their own clustering, and create a ClusteredScoresList object with the modified tables.

Examples

Run this code

  data(samplesList)  # Loads 'samples.list.subset'.
  data(expr)  # Loads 'expr.subset'.
  data(chr21genes)

  fs <- featureScores(samples.list.subset[1:2], chr21genes, up = 2000, down = 1000,
                      freq = 500, s.width = 500)
  clusterPlots(fs, function(x) sqrt(x), n.clusters = 5, expr = as.numeric(expr.subset),
               plot.type = "heatmap", pch = 19, cex = 0.5)