This function computes fate biases for single cells based on expression data from a single cell sequencing experiment. It requires a clustering partition and a target cluster representing a commited state for each trajectory.
fateBias(
x,
y,
tar,
z = NULL,
minnr = NULL,
minnrh = NULL,
adapt = TRUE,
confidence = 0.75,
nbfactor = 5,
use.dist = FALSE,
seed = NULL,
nbtree = NULL,
verbose = FALSE,
...
)
expression data frame with genes as rows and cells as columns. Gene IDs should be given as row names and cell IDs should be given as column names. This can be a reduced expression table only including the features (genes) to be used in the analysis.
clustering partition. A vector with an integer cluster number for each cell. The order of the cells has to be the same as for the columns of x.
vector of integers representing target cluster numbers. Each element of tar
corresponds to a cluster of cells committed towards a particular mature state. One cluster per different cell lineage has to be given and is used as a starting point for learning the differentiation trajectory.
Matrix containing cell-to-cell distances to be used in the fate bias computation. Default is NULL
. In this case, a correlation-based distance is computed from x
by 1 - cor(x)
.
integer number of cells per target cluster to be selected for classification (test set) in each iteration. For each target cluster, the minnr
cells with the highest similarity to a cell in the training set are selected for classification. If z
is not NULL
it is used as the similarity matrix for this step. Otherwise, 1-cor(x)
is used. Default value is NULL
and minnr
is estimated as the minimum of and 20 and half the median of target cluster sizes.
integer number of cells from the training set used for classification. From each training set, the minnrh
cells with the highest similarity to the training set are selected. If z
is not NULL
it is used as the similarity matrix for this step. Default value is NULL
and minnrh
is estimated as the maximum of and 20 and half the median of target cluster sizes.
logical. If TRUE
then the size of the test set for each target cluster is adapted based on the classification success in the previous iteration. For each target cluster, the number of successfully classified cells is determined, i.e. the number of cells with a minimum fraction of votes given by the confidence
parameter for the target cluster, which gave rise to the inclusion of the cell in the test set (see minnr
). Weights are then derived by dividing this number by the maximum across all clusters after adding a pseudocount of 1. The test set size minnr
is rescaled for each cluster by the respective weight in the next iteration. Default is TRUE
.
real number between 0 and 1. See adapt
parameter. Default is 0.75.
positive integer number. Determines the number of trees grown for each random forest. The number of trees is given by the number of columns of th training set multiplied by nbfactor
. Default value is 5.
logical value. If TRUE
then the distance matrix is used as feature matrix (i. e. z
if not equal to NULL
and 1-cor(x)
otherwise). If FALSE
, gene expression values in x
are used. Default is FALSE
.
integer seed for initialization. If equal to NULL
then each run will yield slightly different results due to the radomness of the random forest algorithm. Default is NULL
integer value. If given, it specifies the number of trees for each random forest explicitely. Default is NULL
.
logical. If TRUE
, then print information to console.
additional arguments to be passed to the low level function randomForest
.
A list with the following three components:
a data frame with the fraction of random forest votes for each cell. Columns represent the target clusters. Column names are given by a concatenation of t
and target cluster number.
a data frame with the number of random forest votes for each cell. Columns represent the target clusters. Column names are given by a concatenation of t
and target cluster number.
list of vectors. Each component contains the IDs of all cells on the trajectory to a given target cluster. Component names are given by a concatenation of t
and target cluster number.
list of randomForest objects for each iteration of the classification.
vector of cell ids ordered by the random forest iteration in which they have been classified into one of the target clusters.
The bias is computed as the ratio of the number of random forest votes for a trajectory and the number of votes for the trajectory with the second largest number of votes. By this means only the trajectory with the largest number of votes will receive a bias >1. The siginifcance is computed based on counting statistics on the difference in the number of votes. A significant bias requires a p-value < 0.05. Cells are assigned to a trajectory if they exhibit a significant bias >1 for this trajectory.
# NOT RUN {
x <- intestine$x
y <- intestine$y
tar <- c(6,9,13)
fb <- fateBias(x,y,tar,minnr=5,minnrh=20,adapt=TRUE,confidence=0.75,nbfactor=5)
head(fb$probs)
# }
Run the code above in your browser using DataLab