Learn R Programming

scde (version 2.0.1)

knn.error.models: Build error models for heterogeneous cell populations, based on K-nearest neighbor cells.

Description

Builds cell-specific error models assuming that there are multiple subpopulations present among the measured cells. The models for each cell are based on average expression estimates obtained from K closest cells within a given group (if groups = NULL, then within the entire set of measured cells). The method implements fitting of both the original log-fit models (when linear.fit = FALSE), or newer linear-fit models (linear.fit = TRUE, default) with locally fit overdispersion coefficient (local.theta.fit = TRUE, default).

Usage

knn.error.models(counts, groups = NULL, k = round(ncol(counts)/2),
  min.nonfailed = 5, min.count.threshold = 1, save.model.plots = TRUE,
  max.model.plots = 50, n.cores = parallel::detectCores(),
  min.size.entries = 2000, min.fpm = 0, cor.method = "pearson",
  verbose = 0, fpm.estimate.trim = 0.25, linear.fit = TRUE,
  local.theta.fit = linear.fit, theta.fit.range = c(0.01, 100),
  alpha.weight.power = 1/2)

Arguments

counts
count matrix (integer matrix, rows- genes, columns- cells)
groups
optional groups partitioning known subpopulations
k
number of nearest neighbor cells to use during fitting. If k is set sufficiently high, all of the cells within a given group will be used.
min.nonfailed
minimum number of non-failed measurements (within the k nearest neighbor cells) required for a gene to be taken into account during error fitting procedure
min.count.threshold
minimum number of reads required for a measurement to be considered non-failed
save.model.plots
whether model plots should be saved (file names are (group).models.pdf, or cell.models.pdf if no group was supplied)
max.model.plots
maximum number of models to save plots for (saves time when there are too many cells)
n.cores
number of cores to use through the calculations
min.size.entries
minimum number of genes to use for model fitting
min.fpm
optional parameter to restrict model fitting to genes with group-average expression magnitude above a given value
cor.method
correlation measure to be used in determining k nearest cells
verbose
level of verbosity
fpm.estimate.trim
trim fraction to be used in estimating group-average gene expression magnitude for model fitting (0.5 would be median, 0 would turn off trimming)
linear.fit
whether newer linear model fit with zero intercept should be used (T), or the log-fit model published originally (F)
local.theta.fit
whether local theta fitting should be used (only available for the linear fit models)
theta.fit.range
allowed range of the theta values
alpha.weight.power
1/theta weight power used in fitting theta dependency on the expression magnitude

Value

  • a data frame with parameters of the fit error models (rows- cells, columns- fitted parameters)

Examples

Run this code
data(pollen)
cd <- clean.counts(pollen)
knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)

Run the code above in your browser using DataLab