limma_tidiers: Tidiers for the output of limma (linear models for microarray analysis)

Description

Tidy, augment, and glance methods for MArrayLM objects, which contain the results of gene-wise linear models to microarray datasets. This class is the output of the lmFit and eBayes functions.

Tidying method for a MA list

Tidy an EList expression object

Usage

"tidy"(x, intercept = FALSE, ...)
"augment"(x, data, ...)
"glance"(x, ...)
"tidy"(x, ...)
"tidy"(x, addTargets = FALSE, ...)

Arguments

MArrayLM, MAList, Elist object

intercept

whether the (Intercept) term should be included (default FALSE)

...

extra arguments, not used

data

original expression matrix; if missing, augment returns only the computed per-gene statistics

addTargets

Add sample level information. Default is FALSE.

Value

gene: The name of the gene (extracted from the rownames of the input matrix)
term: The coefficient being estimated
estimate: The estimate of each per-gene coefficient
statistic: Empirical Bayes t-statistic
p.value: p-value computed from t-statistic
lod: log-of-odds score
.gene: gene ID, obtained from the rownames of the input
.sigma: per-gene residual standard deviation
.df.residual: per-gene residual degrees of freedom
.AMean: average intensity across probes
.statistic: moderated F-statistic
.p.value: p-value generated from moderated F-statistic
.df.total: total degrees of freedom per gene
.df.residual: residual degrees of freedom per gene
.s2.post: posterior estimate of residual variance
rank: rank of design matrix
df.prior: empirical Bayesian prior degrees of freedom
s2.prior: empirical Bayesian prior residual standard deviation
gene: gene name
sample: sample name (from column names)
value: expressions on log2 scale
gene: gene name
sample: sample name (from column names)
value: expressions on log2 scale
weight: present if weights is set
other columns: if present and if addTargets is set

Details

Tidying this fit computes one row per coefficient per gene, while augmenting returns one row per gene, with per-gene statistics included. (This is thus a rare case where the augment output has more rows than the tidy output. This is a side effect of the fact that the input to limma is not tidy but rather a one-row-per-gene matrix).

Examples

Run this code

if (require("limma")) {
    # create random data and design
    set.seed(2014)
    dat <- matrix(rnorm(1000), ncol=4)
    dat[, 1:2] <- dat[, 1:2] + .5  # add an effect
    rownames(dat) <- paste0("g", 1:nrow(dat))
    des <- data.frame(treatment = c("a", "a", "b", "b"),
                      confounding = rnorm(4))

    lfit <- lmFit(dat, model.matrix(~ treatment + confounding, des))
    eb <- eBayes(lfit)
    head(tidy(lfit))
    head(tidy(eb))

    if (require("ggplot2")) {
        # the tidied form puts it in an ideal form for plotting
        ggplot(tidy(lfit), aes(estimate)) + geom_histogram(binwidth=1) +
            facet_wrap(~ term)
        ggplot(tidy(eb), aes(p.value)) + geom_histogram(binwidth=.2) +
            facet_wrap(~ term)
    }
}