biobroom (version 1.4.2)

limma_tidiers: Tidiers for the output of limma (linear models for microarray analysis)

Description

Tidy, augment, and glance methods for MArrayLM objects, which contain the results of gene-wise linear models to microarray datasets. This class is the output of the lmFit and eBayes functions.

Tidying method for a MA list

Tidy an EList expression object

Usage

"tidy"(x, intercept = FALSE, ...)
"augment"(x, data, ...)
"glance"(x, ...)
"tidy"(x, ...)
"tidy"(x, addTargets = FALSE, ...)

Arguments

x
MArrayLM, MAList, Elist object
intercept
whether the (Intercept) term should be included (default FALSE)
...
extra arguments, not used
data
original expression matrix; if missing, augment returns only the computed per-gene statistics
addTargets
Add sample level information. Default is FALSE.

Value

The output of tidying functions is always a data frame without rownames.tidy returns one row per gene per coefficient. It always contains the columns
gene
The name of the gene (extracted from the rownames of the input matrix)
term
The coefficient being estimated
estimate
The estimate of each per-gene coefficient
Depending on whether the object comes from eBayes, it may also contain
statistic
Empirical Bayes t-statistic
p.value
p-value computed from t-statistic
lod
log-of-odds score
augment returns one row per gene, containing the original gene expression matrix if provided. It then adds columns containing the per-gene statistics included in the MArrayLM object, each prepended with a .:
.gene
gene ID, obtained from the rownames of the input
.sigma
per-gene residual standard deviation
.df.residual
per-gene residual degrees of freedom
The following columns may also be included, depending on which have been added by lmFit and eBayes:
.AMean
average intensity across probes
.statistic
moderated F-statistic
.p.value
p-value generated from moderated F-statistic
.df.total
total degrees of freedom per gene
.df.residual
residual degrees of freedom per gene
.s2.post
posterior estimate of residual variance
glance returns one row, containing
rank
rank of design matrix
df.prior
empirical Bayesian prior degrees of freedom
s2.prior
empirical Bayesian prior residual standard deviation
tidy returns a data frame with one row per gene-sample combination, with columns
gene
gene name
sample
sample name (from column names)
value
expressions on log2 scale
tidy returns a data frame with one row per gene-sample combination, with columns
gene
gene name
sample
sample name (from column names)
value
expressions on log2 scale
weight
present if weights is set
other columns
if present and if addTargets is set

Details

Tidying this fit computes one row per coefficient per gene, while augmenting returns one row per gene, with per-gene statistics included. (This is thus a rare case where the augment output has more rows than the tidy output. This is a side effect of the fact that the input to limma is not tidy but rather a one-row-per-gene matrix).

Examples

Run this code
if (require("limma")) {
    # create random data and design
    set.seed(2014)
    dat <- matrix(rnorm(1000), ncol=4)
    dat[, 1:2] <- dat[, 1:2] + .5  # add an effect
    rownames(dat) <- paste0("g", 1:nrow(dat))
    des <- data.frame(treatment = c("a", "a", "b", "b"),
                      confounding = rnorm(4))

    lfit <- lmFit(dat, model.matrix(~ treatment + confounding, des))
    eb <- eBayes(lfit)
    head(tidy(lfit))
    head(tidy(eb))

    if (require("ggplot2")) {
        # the tidied form puts it in an ideal form for plotting
        ggplot(tidy(lfit), aes(estimate)) + geom_histogram(binwidth=1) +
            facet_wrap(~ term)
        ggplot(tidy(eb), aes(p.value)) + geom_histogram(binwidth=.2) +
            facet_wrap(~ term)
    }
}

Run the code above in your browser using DataLab