estimateDE: Estimate degrees of differential expression (DE) for individual genes

Description

This function calculates $p$-values (or the related statistics) for identifying differentially expressed genes (DEGs) from a TCC-class object. estimateDE internally calls a specified method implemented in other R packages.

Usage

estimateDE(tcc, test.method, FDR, paired, full, reduced,                   # for DESeq, DESeq2 design, contrast,                # for edgeR, DESeq2, voom coef,                            # for edgeR, voom group, cl,                       # for baySeq samplesize,                      # for baySeq, SAMseq logged, floor,                   # for WAD ...
)

Arguments

tcc

TCC-class object.

test.method

character string specifying a method for identifying DEGs: one of "edger", "deseq", "deseq2", "bayseq", "samseq", "voom", and "wad". See the "Details" field for detail. The default is "edger" when analyzing the count data with replicates (i.e., min(table(tcc$group[, 1])) > 1), and "deseq" (2 group) and "deseq2" (more than 2 group) when analyzing the count data without replicates (i.e., min(table(tcc$group[, 1])) == 1).

FDR

numeric value (between 0 and 1) specifying the threshold for determining DEGs.

paired

logical. If TRUE, the input data are regarded as (two-group) paired samples. If FALSE, the input data are regarded as unpaired samples. The default is FALSE.

full

a formula for creating full model described in DESeq and DESeq2. The right hand side can involve any column of tcc$group is used as the model frame. See the fitNbinomGLMs function in DESeq for details, or nbinomLRT function in DESeq2.

reduced

a formula for creating reduced model described in DESeq. The right hand side can involve any column of tcc$group is used as the model frame. See the fitNbinomGLMs function in DESeq for details, or nbinomLRT function in DESeq2.

design

the argument is used in edgeR, voom (limma) and DESeq2. For edgeR and voom, it should be the numeric matrix giving the design matrix for the generalized linear model. See the glmFit function in edgeR or the lmFit function in limma for details. For DESeq2, it should be a formula specifying the design of the experiment. See the DESeqDataSet function in DESeq2 for details.

contrast

the argument is used in edgeR and DESeq2. For edgeR, numeric vector specifying a contrast of the linear model coefficients to be tested equal to zero. See the glmLRT function in edgeR for details. For DESeq2, the argument is same to contrast which used in DESeq2 package to retrive the results from Wald test. See the results function in DESeq2 for details.

coef

integer or character vector indicating which coefficients of the linear model are to be tested equal to zero. See the glmLRT function in edgeR for details.

group

numeric or character string identifying the columns in the tcc$group for analysis. See the group argument of topCounts function in baySeq for details.

snow object when using multi processors if test.method = "bayseq" is specified. See the getPriors.NB function in baySeq for details.

samplesize

integer specifying (i) the sample size for estimating the prior parameters if test.method = "bayseq" (defaults to 10000), and (ii) the number of permutation in samr if test.method = "samseq" (defaults to 100).

logged

logical. If TRUE, the input data are regarded as log2-transformed. If FALSE, the log2-transformation is performed after the floor setting. The default is logged = FALSE. Ignored if test.method is not "wad".

floor

numeric scalar (> 0) specifying the floor value for taking logarithm. The default is floor = 1, indicating that values less than 1 are replaced by 1. Ignored if logged = TRUE. Ignored if test.method is not "wad".

...

further paramenters.

Value

stat$p.value: numeric vector of $p$-values.
stat$q.value: numeric vector of $q$-values calculated based on the $p$-values using the p.adjust function with default parameter settings.
stat$testStat: numeric vector of test statistics if "wad" is specified.
stat$rank: gene rank in order of the $p$-values or test statistics.
estimatedDEG: numeric vector consisting of 0 or 1 depending on whether each gene is classified as non-DEG or DEG. The threshold for classifying DEGs or non-DEGs is preliminarily given as the FDR argument.

Details

estimaetDE function is generally used after performing the calcNormFactors function that calculates normalization factors. estimateDE constructs a statistical model for differential expression (DE) analysis with the calculated normalization factors and returns the $p$-values (or the derivatives). The individual functions in other packages are internally called according to the specified test.method parameter.

test.method = "edger" There are two approaches (i.e., exact test and GLM) to identify DEGs in edgeR. The two approches are implmented in TCC. As a default, the exact test approach is used for two-group data, and GLM approach is used for multi-group or multi-factor data. However, if design and the one of coef or contrast are given, the GLM approach will be used for two-group data. If the exact test approach is used, estimateCommonDisp, estimateTagwiseDisp, and exactTest are internally called. If the GLM approach is used, estimateGLMCommonDisp, estimateGLMTrendedDisp, estimateGLMTagwiseDisp, glmFit, and glmLRT are internally called.
test.method = "deseq" DESeq supports two approach (i.e. an exact test and GLM approach) for identifying DEGs. As a default, the exact test is used for two-group data, and GLM approach is used for multi-group or multi-factor data. However, if full and reduced are given, the GLM approach will be used for two-group data. If the exact test is used, estimateDispersions and nbinomTest are internally called. If the GLM approach is used, estimateDispersions, fitNbinomGLMs, and nbinomGLMTest are internally called.
test.method = "deseq2" estimateDispersions, and nbinomWaldTest are internally called for identifying DEGs. However, if full and reduced are given, the nbinomLRT will be used.
test.method = "bayseq" getPriors.NB and getLikelihoods in baySeq are internally called for identifying DEGs. If paired = TRUE, getPriors and getLikelihoods in baySeq are used.
test.method = "samseq" SAMseq with resp.type = "Two class unpaired" arugment in samr package is called to identify DEGs for two-group data, resp.type = "Two class paired" for paired two-group data, and resp.type = "Multiclass" for multi-group data.
test.method = "voom" voom, lmFit, and eBayes in limma are internally called for identifying DEGs.
test.method = "wad" The WAD implemented in TCC is used for identifying DEGs. Since WAD outputs test statistics instead of $p$-values, the tcc$stat$p.value and tcc$stat$q.value are NA. Alternatively, the test statistics are stored in tcc$stat$testStat field.

Examples

Run this code

# Analyzing a simulation data for comparing two groups
# (G1 vs. G2) with biological replicates
# The DE analysis is performed by an exact test in edgeR coupled
# with the DEGES/edgeR normalization factors.
# For retrieving the summaries of DE results, we recommend to use
# the getResult function.
data(hypoData)
group <- c(1, 1, 1, 2, 2, 2)
tcc <- new("TCC", hypoData, group)
tcc <- calcNormFactors(tcc, norm.method = "tmm", test.method = "edger",
                       iteration = 1, FDR = 0.1, floorPDEG = 0.05)
tcc <- estimateDE(tcc, test.method = "edger", FDR = 0.1)
head(tcc$stat$p.value)
head(tcc$stat$q.value)
head(tcc$estimatedDEG)
result <- getResult(tcc)


# Analyzing a simulation data for comparing two groups
# (G1 vs. G2) without replicates
# The DE analysis is performed by an negative binomial (NB) test
# in DESeq coupled with the DEGES/DESeq normalization factors.
data(hypoData)
group <- c(1, 2)
tcc <- new("TCC", hypoData[, c(1, 4)], group)
tcc <- calcNormFactors(tcc, norm.method = "deseq", test.method = "deseq",
                       iteration = 1, FDR = 0.1, floorPDEG = 0.05)
tcc <- estimateDE(tcc, test.method = "deseq", FDR = 0.1)

Run the code above in your browser using DataLab