Learn R Programming

edgeR (version 3.8.6)

glmQLFit: Quasi-likelihood methods with empirical Bayes shrinkage

Description

Fit a quasi-likelihood negative binomial generalized log-linear model to count data. Conduct genewise statistical tests for a given coefficient or coefficient contrast.

Usage

glmQLFit(y, design=NULL, dispersion=NULL, abundance.trend=TRUE, robust=FALSE, winsor.tail.p=c(0.05, 0.1), ...) glmQLFTest(glmfit, coef=ncol(glmfit$design), contrast=NULL)

Arguments

y
a DGEList object containing count and sample data.
design
numeric matrix giving the design matrix for the tagwise linear models.
dispersion
numeric scalar or vector of negative binomial dispersions. Defaults to the trended dispersion, or the common dispersion (if no trend is available), or a value of 0.05 (if no common value is available).
abundance.trend
logical, whether to allow an abundance-dependent trend when estimating the prior values for the quasi-likelihood multiplicative dispersion parameter.
robust
logical, whether to estimate the prior degrees of freedom robustly.
winsor.tail.p
numeric vector of length 2 giving proportion to trim (Winsorize) from lower and upper tail of the distribution of genewise deviances when estimating the hyperparameters. Positive values produce robust empirical Bayes ignoring outlier small or large deviances. Only used when robust=TRUE.
...
other arguments are passed to glmFit.
glmfit
a DGEGLM object, usually output from qlmQLFit.
coef
integer or character vector indicating which coefficients of the linear model are to be tested equal to zero.
contrast
numeric vector or matrix specifying one or more contrasts of the linear model coefficients to be tested equal to zero.

Value

glmQLFit produces an object of class DGEGLM with the same components as that produced by glmFit, plus:
df.residual
a numeric vector containing the number of residual degrees of freedom for the GLM fit of each gene.
s2.fit
a list containing df.prior, the prior degrees of fredom; and var.prior, the location of the prior distribution. Both are numeric vectors if abundance.trend=TRUE and scalars otherwise. var.post is a numeric vector containing the shrunk quasi-likelihood dispersion for each gene.
df.prior
a numeric vector or scalar containing the prior degrees of freedom, same as that in s2.fit.
glmQFTest produces objects of class DGELRT with the same components as for glmfit plus the following:
table
data frame with the same rows as y containing the log2-fold changes, F-statistics and p-values, ready to be displayed by topTags..
comparison
character string describing the coefficient or the contrast being tested.
The data frame table contains the following columns:
logFC
log2-fold change of expression between conditions being tested.
logCPM
average log2-counts per million, the average taken over all libraries in y.
F
F-statistics.
PValue
p-values.

Details

glmQLFTest implements the quasi-likelihood method of Lund et al (2012). It behaves the same as glmLRT except that it replaces likelihood ratio tests with quasi-likelihood F-tests for coefficients in the linear model. This function calls the limma function squeezeVar to conduct empirical Bayes smoothing of the genewise multiplicative dispersions. Note that the QuasiSeq package provides a alternative implementation of Lund et al (2012), with slightly different glm, trend and FDR methods.

There are a number of subtleties involved in the use of QL models. The first is that the negative binomial dispersions must be trended or common values. This is because the function assumes that the supplied values are the true values. For the trended/common values, the assumption is reasonable as information from many genes improves precision. This is not the case for the tagwise dispersions due to the limited information for each gene.

Another subtlety involves the handling of zero counts. Observations with fitted values of zero provide no residual degrees of freedom. This must be considered when computing the value of the quasi-likelihood dispersion for genes with many zeros. Finally, a lower bound is defined for the p-value of each gene, based on the likelihood ratio test. This avoids spurious results involving weak shrinkage with very low quasi-likelihood dispersions.

References

Lund, SP, Nettleton, D, McCarthy, DJ, and Smyth, GK (2012). Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Statistical Applications in Genetics and Molecular Biology Volume 11, Issue 5, Article 8. http://www.statsci.org/smyth/pubs/QuasiSeqPreprint.pdf

See Also

topTags displays results from glmQLFTest.

plotQLDisp can be used to visualize the distribution of QL dispersions after EB shrinkage from glmQLFit.

The QuasiSeq package gives an alternative implementation of glmQLFTest based on the same statistical ideas.

Examples

Run this code
nlibs <- 4
ntags <- 1000
dispersion.true <- 1/rchisq(ntags, df=10)
design <- model.matrix(~factor(c(1,1,2,2)))

# Generate count data
y <- rnbinom(ntags*nlibs,mu=20,size=1/dispersion.true)
y <- matrix(y,ntags,nlibs)
d <- DGEList(y)
d <- calcNormFactors(d)

# Fit the NB GLMs with QL methods
d <- estimateDisp(d, design)
fit <- glmQLFit(d, design)
results <- glmQLFTest(fit)
topTags(results)
fit <- glmQLFit(d, design, robust=TRUE)
results <- glmQLFTest(fit)
topTags(results)
fit <- glmQLFit(d, design, abundance.trend=FALSE)
results <- glmQLFTest(fit)
topTags(results)

Run the code above in your browser using DataLab