estimateGLMRobustDisp: Empirical Robust Bayes Tagwise Dispersions for Negative Binomial GLMs using Observation Weights

Description

Compute a robust estimate of the negative binomial dispersion parameter for each tag or transcript, with expression levels specified by a log-linear model, using observation weights. These observation weights will be stored and used later for estimating regression parameters.

Usage

estimateGLMRobustDisp(y, design = NULL, prior.df = 10, update.trend = TRUE, trend.method = "bin.loess", maxit = 6, k = 1.345, residual.type = "pearson", verbose = FALSE, record = FALSE)

Arguments

a DGEList object.

design

numeric design matrix, as for glmFit.

prior.df

prior degrees of freedom.

update.trend

logical. Should the trended dispersion be re-estimated at each iteration?

trend.method

method (low-level function) used to estimated the trended dispersions. estimateGLMTrendedDisp

maxit

maximum number of iterations for weighted estimateGLMTagwiseDisp.

the tuning constant for Huber estimator. If the absolute value of residual (r) is less than k, its observation weight is 1, otherwise k/abs(r).

residual.type

type of residual (r) used for estimation observation weight

verbose

logical. Should verbose comments be printed?

record

logical. Should information for each iteration be recorded (and returned as a list)?

Value

estimateGLMRobustDisp produces a DGEList object, which contains the (robust) tagwise dispersion parameter estimate for each tag for the negative binomial model that maximizes the weighted Cox-Reid adjusted profile likelihood, as well as the observation weights. The observation weights are calculated using residuals and the Huber function.Note that when record=TRUE, a simple list of DGEList objects is returned, one for each iteration (this is for debugging or tracking purposes).

Details

At times, because of the moderation of dispersion estimates towards a trended values, features (typically, genes) can be sensitive to outliers and causing false positives. That is, since the dispersion estimates are moderated downwards toward the trend and because the regression parameter estimates may be affected by the outliers, genes are deemed significantly differential expressed. The function uses an iterative procedure where weights are calculated from residuals and estimates are made after re-weighting.

Note: it is not necessary to first calculate the common, trended and tagwise dispersion estimates. If these are not available, the function will first calculate this (in an unweighted) fashion.

References

Zhou X, Lindsay H, Robinson MD (2014). Robustly detecting differential expression in RNA sequencing data using observation weights. Nucleic Acids Research, 42(11), e91.

Examples

Run this code

y <- matrix(rnbinom(100*6,mu=10,size=1/0.1),ncol=6)
d <- DGEList(counts=y,group=c(1,1,1,2,2,2),lib.size=c(1000:1005))
d <- calcNormFactors(d)
design <- model.matrix(~group, data=d$samples) # Define the design matrix for the full model
d <- estimateGLMRobustDisp(d, design)
summary(d$tagwise.dispersion)

Run the code above in your browser using DataLab