scde.error.models: Fit single-cell error/regression models

Description

Fit error models given a set of single-cell data (counts) and an optional grouping factor (groups). The cells (within each group) are first cross-compared to determine a subset of genes showing consistent expression. The set of genes is then used to fit a mixture model (Poisson-NB mixture, with expression-dependent concomitant).

Usage

scde.error.models(counts, groups = NULL, min.nonfailed = 3,
  threshold.segmentation = TRUE, min.count.threshold = 4,
  zero.count.threshold = min.count.threshold, zero.lambda = 0.1,
  save.crossfit.plots = FALSE, save.model.plots = TRUE, n.cores = 12,
  min.size.entries = 2000, max.pairs = 5000, min.pairs.per.cell = 10,
  verbose = 0, linear.fit = TRUE, local.theta.fit = linear.fit,
  theta.fit.range = c(0.01, 100))

Arguments

counts

read count matrix. The rows correspond to genes (should be named), columns correspond to individual cells. The matrix should contain integer counts

groups

an optional factor describing grouping of different cells. If provided, the cross-fits and the expected expression magnitudes will be determined separately within each group. The factor should have the same length as ncol(counts).

min.nonfailed

minimal number of non-failed observations required for a gene to be used in the final model fitting

threshold.segmentation

use a fast threshold-based segmentation during cross-fit (default: TRUE)

min.count.threshold

the number of reads to use to guess which genes may have "failed" to be detected in a given measurement during cross-cell comparison (default: 4)

zero.count.threshold

threshold to guess the initial value (failed/non-failed) during error model fitting procedure (defaults to the min.count.threshold value)

zero.lambda

the rate of the Poisson (failure) component (default: 0.1)

save.crossfit.plots

whether png files showing cross-fit segmentations should be written out (default: FALSE)

save.model.plots

whether pdf files showing model fits should be written out (default = TRUE)

n.cores

number of cores to use

min.size.entries

minimum number of genes to use when determining expected expression magnitude during model fitting

max.pairs

maximum number of cross-fit comparisons that should be performed per group (default: 5000)

min.pairs.per.cell

minimum number of pairs that each cell should be cross-compared with

verbose

1 for increased output

linear.fit

Boolean of whether to use a linear fit in the regression (default: TRUE).

local.theta.fit

Boolean of whether to fit the overdispersion parameter theta, ie. the negative binomial size parameter, based on local regression (default: set to be equal to the linear.fit parameter)

theta.fit.range

Range of valid values for the overdispersion parameter theta, ie. the negative binomial size parameter (default: c(1e-2, 1e2))

Value

a model matrix, with rows corresponding to different cells, and columns representing different parameters of the determined models

Details

Note: the default implementation has been changed to use linear-scale fit with expression-dependent NB size (overdispersion) fit. This represents an interative improvement on the originally published model. Use linear.fit=F to revert back to the original fitting procedure.

Examples

Run this code

data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
sg <- factor(gsub("(MEF|ESC).*", "\\1", colnames(cd)), levels = c("ESC", "MEF"))
names(sg) <- colnames(cd)
o.ifm <- scde.error.models(counts = cd, groups = sg, n.cores = 10, threshold.segmentation = TRUE)

Run the code above in your browser using DataLab