knn.error.models: Build error models for heterogeneous cell populations, based on K-nearest neighbor cells.

Description

Builds cell-specific error models assuming that there are multiple subpopulations present among the measured cells. The models for each cell are based on average expression estimates obtained from K closest cells within a given group (if groups = NULL, then within the entire set of measured cells). The method implements fitting of both the original log-fit models (when linear.fit = FALSE), or newer linear-fit models (linear.fit = TRUE, default) with locally fit overdispersion coefficient (local.theta.fit = TRUE, default).

Usage

knn.error.models(counts, groups = NULL, k = round(ncol(counts)/2),
  min.nonfailed = 5, min.count.threshold = 1, save.model.plots = TRUE,
  max.model.plots = 50, n.cores = parallel::detectCores(),
  min.size.entries = 2000, min.fpm = 0, cor.method = "pearson",
  verbose = 0, fpm.estimate.trim = 0.25, linear.fit = TRUE,
  local.theta.fit = linear.fit, theta.fit.range = c(0.01, 100),
  alpha.weight.power = 1/2)

Arguments

counts

count matrix (integer matrix, rows- genes, columns- cells)

groups

optional groups partitioning known subpopulations

number of nearest neighbor cells to use during fitting. If k is set sufficiently high, all of the cells within a given group will be used.

min.nonfailed

minimum number of non-failed measurements (within the k nearest neighbor cells) required for a gene to be taken into account during error fitting procedure

min.count.threshold

minimum number of reads required for a measurement to be considered non-failed

save.model.plots

whether model plots should be saved (file names are (group).models.pdf, or cell.models.pdf if no group was supplied)

max.model.plots

maximum number of models to save plots for (saves time when there are too many cells)

n.cores

number of cores to use through the calculations

min.size.entries

minimum number of genes to use for model fitting

min.fpm

optional parameter to restrict model fitting to genes with group-average expression magnitude above a given value

cor.method

correlation measure to be used in determining k nearest cells

verbose

level of verbosity

fpm.estimate.trim

trim fraction to be used in estimating group-average gene expression magnitude for model fitting (0.5 would be median, 0 would turn off trimming)

linear.fit

whether newer linear model fit with zero intercept should be used (T), or the log-fit model published originally (F)

local.theta.fit

whether local theta fitting should be used (only available for the linear fit models)

theta.fit.range

allowed range of the theta values

alpha.weight.power

1/theta weight power used in fitting theta dependency on the expression magnitude

Value

a data frame with parameters of the fit error models (rows- cells, columns- fitted parameters)

Examples

Run this code

data(pollen)
cd <- clean.counts(pollen)
knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)

Run the code above in your browser using DataLab