These functions performs a over-representation analysis for Gene Ontology terms or KEGG pathways in a list of Entrez Gene IDs.
The default method accepts the gene list as a vector of gene IDs,
while the MArrayLM
method extracts the gene lists automatically from a linear model fit object.goana
uses annotation from the appropriate Bioconductor organism package.
kegga
uses the KEGGREST package to access KEGG pathway annotation.
The default method accepts a vector prior.prob
giving the prior probability that each gene in the universe appears in a gene set.
This vector can be used to correct for unwanted trends in the differential expression analysis associated with gene length, gene abundance or any other covariate.
The MArrayLM
object computes the prior.prob
vector automatically when trend
is non-NULL
.
If prior.prob=NULL
, the function computes one-sided hypergeometric tests equivalent to Fisher's exact test.
If prior probabilities are specified, then a test based on the Wallenius' noncentral hypergeometric distribution is used to adjust for the relative probability that each gene will appear in a gene set, following the approach of Young et al (2010).
The MArrayLM
methods performs over-representation analyses for the up and down differentially expressed genes from a linear model analysis.
In this case, the universe is all the genes found in the fit object.
trend=FALSE
is equivalent to prior.prob=NULL
.
If trend=TRUE
or a covariate is supplied, then a trend is fitted to the differential expression results and this is used to set prior.prob
.
The statistical approach provided here is the same as that provided by the goseq package, with one methodological difference and a few restrictions.
Unlike the goseq package, the gene identifiers here must be Entrez Gene IDs and the user is assumed to be able to supply gene lengths if necessary.
The goseq package has additional functionality to convert gene identifiers and to provide gene lengths.
The only methodological difference is that goana
and kegga
computes gene length or abundance bias using tricubeMovingAverage
instead of monotonic regression.
While tricubeMovingAverage
does not enforce monotonicity, it has the advantage of numerical stability when de
contains only a small number of genes.