estimateProportionsCP: estimatePropotionsCP
Description
Estimates cell type proportions using the constrained projection method from Houseman et al. [1]
Usage
estimateProportionsCP(rnb.set, cell.type.column, n.most.variable = NA, n.markers = 500L, constrained = TRUE, full.output = FALSE)
Arguments
cell.type.column
integer index or character identifier of a column
in the RnBSet object sample annotation table
which gives the mapping to reference cell type
samples
n.most.variable
singleton integer specifying how many top variable CpGs should be used for marker selection.
If NA
all the sites are considered (take into account extended computation times).
n.markers
singleton integer specifying how many CpGs should
be used as markers for fitting the projection model
constrained
if TRUE
the returned cell type proportion estimates
are non-negative
full.output
if TRUE
not only the estimated proportions
but also the intermediate analysis results are returned
Value
a matrix of estimated cell type contributions (samples times cell types) or a list with results of the intermetidate steps (see details).
Details
The column specified by cell.type.column
should give
assignment of each reference sample to a cell type and missing values for
all the target samples. First the marker selection model is fit to estimate association of each CpG
with the given reference cell types (first expression in eq. (1) of [1]) . The strength of association is expressed as an F-statistic.
Since fitting the marker selection model to all CpGs can take a lot of time, one can limit the marker search only to variable CpG positions
by setting n.most.variable
to non-NA
positive integer. The CpGs will be ranked by decreasing across-sample variance in the
reference data set and n.most.variable
will be taken to fit the marker selection model.
Coefficients of the fit together with the F-statistic value for each CpG are returned in case full.output
is TRUE
.
Thereafter, n.markers
are selected as true quantitative markers and the projection model (eq. [2]) is fit to estimate contributions of each cell type.
Depending on the value of constrained
the returned coefficients can be either raw or enforced to attain values between 0 and 1 with within-sample sum
less or equal to 1.
References
1. Houseman, Eugene and Accomando, William and Koestler, Devin and Christensen, Brock and Marsit, Carmen and Nelson,
Heather and Wiencke, John and Kelsey, Karl. DNA methylation arrays as surrogate measures of cell mixture distribution.
BMC Bioinformatics 2012, 13:86