Learn R Programming

scde (version 2.0.1)

scde.error.models: Fit single-cell error/regression models

Description

Fit error models given a set of single-cell data (counts) and an optional grouping factor (groups). The cells (within each group) are first cross-compared to determine a subset of genes showing consistent expression. The set of genes is then used to fit a mixture model (Poisson-NB mixture, with expression-dependent concomitant).

Usage

scde.error.models(counts, groups = NULL, min.nonfailed = 3,
  threshold.segmentation = TRUE, min.count.threshold = 4,
  zero.count.threshold = min.count.threshold, zero.lambda = 0.1,
  save.crossfit.plots = FALSE, save.model.plots = TRUE, n.cores = 12,
  min.size.entries = 2000, max.pairs = 5000, min.pairs.per.cell = 10,
  verbose = 0, linear.fit = TRUE, local.theta.fit = linear.fit,
  theta.fit.range = c(0.01, 100))

Arguments

counts
read count matrix. The rows correspond to genes (should be named), columns correspond to individual cells. The matrix should contain integer counts
groups
an optional factor describing grouping of different cells. If provided, the cross-fits and the expected expression magnitudes will be determined separately within each group. The factor should have the same length as ncol(counts).
min.nonfailed
minimal number of non-failed observations required for a gene to be used in the final model fitting
threshold.segmentation
use a fast threshold-based segmentation during cross-fit (default: TRUE)
min.count.threshold
the number of reads to use to guess which genes may have "failed" to be detected in a given measurement during cross-cell comparison (default: 4)
zero.count.threshold
threshold to guess the initial value (failed/non-failed) during error model fitting procedure (defaults to the min.count.threshold value)
zero.lambda
the rate of the Poisson (failure) component (default: 0.1)
save.crossfit.plots
whether png files showing cross-fit segmentations should be written out (default: FALSE)
save.model.plots
whether pdf files showing model fits should be written out (default = TRUE)
n.cores
number of cores to use
min.size.entries
minimum number of genes to use when determining expected expression magnitude during model fitting
max.pairs
maximum number of cross-fit comparisons that should be performed per group (default: 5000)
min.pairs.per.cell
minimum number of pairs that each cell should be cross-compared with
verbose
1 for increased output
linear.fit
Boolean of whether to use a linear fit in the regression (default: TRUE).
local.theta.fit
Boolean of whether to fit the overdispersion parameter theta, ie. the negative binomial size parameter, based on local regression (default: set to be equal to the linear.fit parameter)
theta.fit.range
Range of valid values for the overdispersion parameter theta, ie. the negative binomial size parameter (default: c(1e-2, 1e2))

Value

  • a model matrix, with rows corresponding to different cells, and columns representing different parameters of the determined models

Details

Note: the default implementation has been changed to use linear-scale fit with expression-dependent NB size (overdispersion) fit. This represents an interative improvement on the originally published model. Use linear.fit=F to revert back to the original fitting procedure.

Examples

Run this code
data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
sg <- factor(gsub("(MEF|ESC).*", "\\1", colnames(cd)), levels = c("ESC", "MEF"))
names(sg) <- colnames(cd)
o.ifm <- scde.error.models(counts = cd, groups = sg, n.cores = 10, threshold.segmentation = TRUE)

Run the code above in your browser using DataLab