estimateProportionsCP: estimatePropotionsCP

Description

Estimates cell type proportions using the constrained projection method from Houseman et al. [1]

Usage

estimateProportionsCP(rnb.set, cell.type.column, n.most.variable = NA, n.markers = 500L, constrained = TRUE, full.output = FALSE)

Arguments

rnb.set

RnBSet object

cell.type.column

integer index or character identifier of a column in the RnBSet object sample annotation table which gives the mapping to reference cell type samples

n.most.variable

singleton integer specifying how many top variable CpGs should be used for marker selection. If NA all the sites are considered (take into account extended computation times).

n.markers

singleton integer specifying how many CpGs should be used as markers for fitting the projection model

constrained

if TRUE the returned cell type proportion estimates are non-negative

full.output

if TRUE not only the estimated proportions but also the intermediate analysis results are returned

Value

a matrix of estimated cell type contributions (samples times cell types) or a list with results of the intermetidate steps (see details).

Details

The column specified by cell.type.column should give assignment of each reference sample to a cell type and missing values for all the target samples. First the marker selection model is fit to estimate association of each CpG with the given reference cell types (first expression in eq. (1) of [1]) . The strength of association is expressed as an F-statistic. Since fitting the marker selection model to all CpGs can take a lot of time, one can limit the marker search only to variable CpG positions by setting n.most.variable to non-NA positive integer. The CpGs will be ranked by decreasing across-sample variance in the reference data set and n.most.variable will be taken to fit the marker selection model. Coefficients of the fit together with the F-statistic value for each CpG are returned in case full.output is TRUE. Thereafter, n.markers are selected as true quantitative markers and the projection model (eq. [2]) is fit to estimate contributions of each cell type. Depending on the value of constrained the returned coefficients can be either raw or enforced to attain values between 0 and 1 with within-sample sum less or equal to 1.

References

1. Houseman, Eugene and Accomando, William and Koestler, Devin and Christensen, Brock and Marsit, Carmen and Nelson, Heather and Wiencke, John and Kelsey, Karl. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics 2012, 13:86