compute.backbone.tree(lda.results, grouping = NULL, start.group.label = NULL, absolute.width = 0, width.scale.factor = 1.2, outlier.tolerance.factor = 0.1, rooting.method = NULL, only.mst = FALSE, grouping.colors = NULL, merge.sequential.backbone = FALSE)compute.ldalda.results object. E.g. a sampling times (numeric) or tissue categories.grouping parameter is provided, you can optionally specify the starting group. If no start.group.label is specified and the grouping vector is numeric, the lowest value will automatically be selected. Otherwise, the group with lowest mean-squared-distance between cells is selected.absolute.width is provided). Higher values will result in less branches in the backbone tree, while lower values might lead to a large number of backbone branches.TRUE, returns a simple rooted minimum-spanning tree, instead of a backbone tree.only.mst is TRUE) or a quasi-optimal backbone tree connecting all input cells. Cell topic distribution, distances and branch order are added as vertex/edge/graph attributes.
Considering a set of vertices $V$ and a distance function over all pairs of vertices: $d: V × V -> R+$, we call backbone tree a graph, $T$ with backbone $B$, such that:
In this instance, we relax the last condition to cover only `most' non-backbone vertices, allowing for a variable proportion of outliers at distance $> \delta$ from any vertices in $V_B$.
We can then define the `optimal' backbone tree to be a backbone tree such that the sum of weighted edges in the backbone subtree $E_B$ is minimal. Finding such a tree can be easily shown to be NP-Complete (by reduction to the Vertex Cover problem), but we developed a fast heuristic relying on Minimum Spanning Tree to produce a reasonable approximation.
The resulting quasi-optimal backbone tree (simply referred to as `the' backbone tree thereafter) gives a clear hierarchical representation of the cell relationship: the objective function puts pressure on finding a (small) group of prominent cells (the backbone) that are good representatives of major steps in the cell evolution (in time or space), while remaining cells are similar enough to their closest representative for their difference to be ignored. Such a tree provides a very clear visualisation of overall cell differentiation paths (including potential differentiation into sub-types).
# Load pre-computed LDA model for skeletal myoblast RNA-Seq data from HSMMSingleCell package:
data(HSMM_lda_model)
# Recover sampling time (in days) for each cell:
library(HSMMSingleCell)
data(HSMM_sample_sheet)
days.factor = HSMM_sample_sheet$Hours
days = as.numeric(levels(days.factor))[days.factor]
# Compute near-optimal backbone tree:
b.tree = compute.backbone.tree(HSMM_lda_model, days)
# Plot resulting tree with sampling time as a vertex group colour:
ct.plot.grouping(b.tree)
Run the code above in your browser using DataLab