compute.backbone.tree: Backbone Tree construction

Description

Builds a `backbone tree' from a fitted LDA model.

Usage

compute.backbone.tree(lda.results, grouping = NULL, start.group.label = NULL, absolute.width = 0, width.scale.factor = 1.2, outlier.tolerance.factor = 0.1, rooting.method = NULL, only.mst = FALSE, grouping.colors = NULL, merge.sequential.backbone = FALSE)

Arguments

lda.results

A fitted LDA model, as returned by compute.lda

grouping

An (optional) vector of labels for each cell in the lda.results object. E.g. a sampling times (numeric) or tissue categories.

start.group.label

If a grouping parameter is provided, you can optionally specify the starting group. If no start.group.label is specified and the grouping vector is numeric, the lowest value will automatically be selected. Otherwise, the group with lowest mean-squared-distance between cells is selected.

absolute.width

Numeric (optional). Distance threshold below which a cell vertex is considered to be attached to a backbone vertex (see paper for more details). By default, this threshold is computed dynamically, based on the distance distribution for each branch.

width.scale.factor

Numeric (optional). A scaling factor for the dynamically-computed distance threshold (ignored if absolute.width is provided). Higher values will result in less branches in the backbone tree, while lower values might lead to a large number of backbone branches.

outlier.tolerance.factor

Numeric (optional). Proportion of vertices, out of the total number of vertices divided by the total number of branches, that can be left at the end of the backbone tree-building algorithm.

rooting.method

String (optional). Method used to root the backbone tree. Must be either NULL or one of: `longest.path', `center.start.group' or `average.start.group'. `longest.path` picks one end of the longest shortest-path between two vertices. `center.start.group' picks the vertex in the starting group with lowest mean-square-distance to the others. `average.start.group' creates a new artificial vertex, as the average of all cells in the starting group. If no value is provided, the best method is picked based on the type of grouping and start group information available.

only.mst

If TRUE, returns a simple rooted minimum-spanning tree, instead of a backbone tree.

grouping.colors

(Optional) vector of RGB colors to be used for each grouping.

merge.sequential.backbone

(Optional) whether to merge sequential backbone vertices that are close enough. This will produce a more compact backbone tree, but at the cost of extra computing time.

Value

A igraph object with either a minimum rooted spanning-tree (if only.mst is TRUE) or a quasi-optimal backbone tree connecting all input cells. Cell topic distribution, distances and branch order are added as vertex/edge/graph attributes.

Details

In order to easily visualise the structural and temporal relationship between cells, we introduced a special type of tree structure dubbed `backbone tree', defined as such:

Considering a set of vertices $V$ and a distance function over all pairs of vertices: $d: V × V -> R+$, we call backbone tree a graph, $T$ with backbone $B$, such that:

$T$ is a tree with set of vertices $V$ and edges $E$.
$B$ is a tree with set of vertices $V_B in V$ and edges $E_B in E$.
All `vertebrae' vertices of $T$: $v in V \ V_B$ are connected by a single edge to the closest vertex in the set of backbone vertices $v*_B in V_B$. I.e: $v*_B = argmin_{v_B in V_B} d(v_B, v)$.
For all vertices in $V \ V_B$ are less than distance $\delta$ to a vertex in the backbone tree $B$: $for all v in V \ V_B, there is a v_B in V_B$ such that $d(v, v_b) < \delta$.

In this instance, we relax the last condition to cover only `most' non-backbone vertices, allowing for a variable proportion of outliers at distance $> \delta$ from any vertices in $V_B$.

We can then define the `optimal' backbone tree to be a backbone tree such that the sum of weighted edges in the backbone subtree $E_B$ is minimal. Finding such a tree can be easily shown to be NP-Complete (by reduction to the Vertex Cover problem), but we developed a fast heuristic relying on Minimum Spanning Tree to produce a reasonable approximation.

The resulting quasi-optimal backbone tree (simply referred to as `the' backbone tree thereafter) gives a clear hierarchical representation of the cell relationship: the objective function puts pressure on finding a (small) group of prominent cells (the backbone) that are good representatives of major steps in the cell evolution (in time or space), while remaining cells are similar enough to their closest representative for their difference to be ignored. Such a tree provides a very clear visualisation of overall cell differentiation paths (including potential differentiation into sub-types).

Examples

Run this code

# Load pre-computed LDA model for skeletal myoblast RNA-Seq data from HSMMSingleCell package:
data(HSMM_lda_model)

# Recover sampling time (in days) for each cell:
library(HSMMSingleCell)
data(HSMM_sample_sheet)
days.factor = HSMM_sample_sheet$Hours
days = as.numeric(levels(days.factor))[days.factor]

# Compute near-optimal backbone tree:
b.tree = compute.backbone.tree(HSMM_lda_model, days)
# Plot resulting tree with sampling time as a vertex group colour:
ct.plot.grouping(b.tree)

Run the code above in your browser using DataLab