Usage
GO_analyse( eSet, f, subset=NULL, biomart_dataset="", microarray="", method="randomForest", rank.by="rank", do.trace=100, ntree=1000, mtry=ceiling(2*sqrt(nrow(eSet))), GO_genes=NULL, all_GO=NULL, all_genes=NULL, FUN.GO=mean, ...)
Arguments
eSet
ExpressionSet
of the Biobase
package including a
gene-by-sample expression matrix in the AssayData
slot, and a
phenotypic information data-frame in the phenodate
slot. In the
expression matrix, row names are identifiers of expressed features, and
column names are identifiers of the individual samples.
In the phenotypic data-frame, row names are sample idenfifiers, column
names are grouping factors and phenotypic traits usable for
statistical tests and visualisation methods.
f
A column name in phenodata
used as the grouping factor for the
analysis.
subset
A named list to subset eSet
for the analysis. Names must be
column names existing in colnames(pData(eSet)). Values must be vectors of
values existing in the corresponding column of pData(eSet). The original
ExpressionSet will be left unchanged.
biomart_dataset
The Ensembl BioMart dataset identifier corresponding to the species
studied.
If not specified and no custom annotations were provided, the method will
attempt to automatically identify the
adequate dataset from the first feature identifier in the dataset.
Use data(prefix2dataset) to access a table listing valid choices.
microarray
The identifier in the Ensembl BioMart corresponding to the microarray
platform used. If not specified and no custom annotations were provided,
the method will attempt to
automatically identify the platform used from the first feature identifier
in the dataset.
Use data(microarray2dataset)
to access a table listing valid
choices.
method
The statistical framework to score genes and gene ontologies.
Either "randomForest" or "rf" to use the random forest algorithm, or
alternatively either of "anova" or "a" to use the one-way ANOVA model.
Default is "randomForest".
rank.by
Either of "rank" or "score" to chose the metric used to order the gene and
GO term result tables. Default to 'rank'.
do.trace
Only used if method="randomForest". If set to TRUE, gives a more verbose
output as randomForest is run. If set to some integer, then running output
is printed for every do.trace trees. Default is 100.
ntree
Only used if method="randomForest". Number of trees to grow. This should
be set to a number large enough to ensure that every input row gets
predicted at least a few times
mtry
Only used if method="randomForest". Number of features randomly sampled as
candidates at each split. Default value is 2*sqrt(gene_count) which is
approximately 220 genes for a dataset of 12,000 genes.
GO_genes
Custom annotations associating features present in the expression dataset
to gene ontology identifiers. This must be provided as a data-frame of
two columns, named gene_id
and go_id
. If provided, no call
to the Ensembl BioMart server will be done, and arguments all_GO
and all_genes
should be provided as well, to enable all downstream
features of GOexpress
.
An example is provided in AlvMac_GOgenes
.
all_GO
Custom annotations used to annotate each GO identifier present in
GO_genes
with the ontology name (e.g. "apoptotic process") and
namespace
(i.e. "biological_process", "molecular_function", or
"cellular_component").
This must be provided as a data-frame containing
at least one column named go_id
, and preferably two more columns
named name_1006
and namespace_1003
for consistency with
the Ensembl BioMart. Supported alternative column headers are
name
and namespace
.
Respectively, name
should be used to provide
the description of the GO term, and namespace
should contain
one of "biological_process", "molecular_function" and
"cellular_component". name
is used to generate the title of
ontology-based figured, and namespace
is important to enable
subsequent filtering of results by their corresponding value.
An example is provided in data(AlvMac_allGO)
.
all_genes
Custom annotations used to annotate each feature identifier in the
expression dataset with the gene name or symbol (e.g. "TNF"), and an
optional description. This must be provided as a data-frame containing at
least a column named gene_id
and preferably two more columns named
external_gene_name
and description
for consistency with
the Ensembl BioMart. A supported alternative header is name
.
While external_gene_name
is important to enable
subsequent visualisation of results by gene symbol, description
is only displayed for readability of result tables.
An example is provided in data(AlvMac_allgenes)
.
FUN.GO
Function to summarise the score and rank of all feature associated with
each gene ontology. Default is mean
function. If using
"lambda-like" (anonymous) functions, these must take a list of numeric
values as an input, and return a single numeric value as an output.
...
Additional arguments passed on to the randomForest() method, if
applicable.