Learn R Programming

dcGOR (version 1.0.0)

dcEnrichment: Function to conduct enrichment analysis given the input data and the ontology in query

Description

dcEnrichment is supposed to conduct enrichment analysis given the input data and the ontology in query. It returns an object of class "eTerm". Enrichment analysis is based on either Fisher's exact test or Hypergeometric test. The test can respect the hierarchy of the ontology.

Usage

dcEnrichment(data, domain = c("SCOP.sf"), ontology = c("GOBP", "GOMF",
"GOCC"), sizeRange = c(10, 1000), min.overlap = 3,
which_distance = NULL, test = c("HypergeoTest", "FisherTest",
"BinomialTest"), p.adjust.method = c("BH", "BY", "bonferroni", "holm",
"hochberg", "hommel"), ontology.algorithm = c("none", "pc", "elim",
"lea"),
elim.pvalue = 0.01, lea.depth = 2, verbose = T,
RData.location = "http://supfam.org/dcGOR/data")

Arguments

data
an input vector. It contains id for a list of domains, for example, sunids for SCOP domains
domain
the domain identity. It can be one of 'SCOP.sf' for SCOP superfamilies
ontology
the ontology supported currently. It can be "GOBP" for Gene Ontology Biological Process, "GOMF" for Gene Ontology Molecular Function, "GOCC" for Gene Ontology Cellular Component. For details on the eligibility for pairs of input domain and ontology, pleas
sizeRange
the minimum and maximum size of members of each term in consideration. By default, it sets to a minimum of 10 but no more than 1000
min.overlap
the minimum number of overlaps. Only those terms that overlap with input data at least min.overlap (3 domains by default) will be processed
which_distance
which distance of terms in the ontology is used to restrict terms in consideration. By default, it sets to 'NULL' to consider all distances
test
the statistic test used. It can be "FisherTest" for using fisher's exact test, "HypergeoTest" for using hypergeometric test, or "BinomialTest" for using binomial test. Fisher's exact test is to test the independence between domain group (domains belonging
p.adjust.method
the method used to adjust p-values. It can be one of "BH", "BY", "bonferroni", "holm", "hochberg" and "hommel". The first two methods "BH" (widely used) and "BY" control the false discovery rate (FDR: the expected proportion of false discoveries amongst t
ontology.algorithm
the algorithm used to account for the hierarchy of the ontology. It can be one of "none", "pc", "elim" and "lea". For details, please see 'Note'
elim.pvalue
the parameter only used when "ontology.algorithm" is "elim". It is used to control how to declare a signficantly enriched term (and subsequently all domains in this term are eliminated from all its ancestors)
lea.depth
the parameter only used when "ontology.algorithm" is "lea". It is used to control how many maximum depth is uded to consider the children of a term (and subsequently all domains in these children term are eliminated from the use for the recalculation of t
verbose
logical to indicate whether the messages will be displayed in the screen. By default, it sets to TRUE for display
RData.location
the characters to tell the location of built-in RData files. By default, it remotely locates at "http://supfam.org/dnet/data" or "https://github.com/hfang-bristol/dcGOR/data". For the user equipped with fast internet connection, this option can be just le

Value

  • an object of class "eTerm", a list with following components:
    • term_info: a matrix of nTerm X 5 containing term information, where nTerm is the number of terms in consideration, and the 5 columns are "term_id" (i.e. "Term ID"), "term_name" (i.e. "Term Name"), "namespace" (i.e. "Term Namespace"), "distance" (i.e. "Term Distance") and "IC" (i.e. "Information Content for the term based on annotation frequency by it")
  • anno: a list of terms, each storing annotated domain members. Always, terms are identified by "term_id" and domain members identified by their ids (e.g. sunids for SCOP domains)
  • data: a vector containing input data in consideration. It is not always the same as the input data as only those mappable are retained
  • overlap: a list of terms, each storing domains overlapped between domains annotated by a term and domains in the input data (i.e. the domains of interest). Always, terms are identified by "term_id" and domain members identified by their ids (e.g. sunids for SCOP domains)
  • zscore: a vector containing z-scores
  • pvalue: a vector containing p-values
  • adjp: a vector containing adjusted p-values. It is the p value but after being adjusted for multiple comparisons
  • call: the call that produced this result

See Also

dcRDataLoader, dcDAGannotate

Examples

Run this code
# 1) load SCOP.sf (as 'InfoDataFrame' object)
SCOP.sf <- dcRDataLoader('SCOP.sf')
# randomly select 20 domains
data <- sample(rowNames(SCOP.sf), 20)

# 2) perform enrichment analysis
eTerm <- dcEnrichment(data, domain="SCOP.sf", ontology="GOMF")

# 3) visualise the top significant terms in the ontology hierarchy
# load obo.GOMF (as 'igraph' object)
g <- dcRDataLoader('obo.GOMF')
# focus the top 5 enriched terms
nodes_query <- names(sort(eTerm$adjp)[1:5])
nodes.highlight <- rep("red", length(nodes_query))
names(nodes.highlight) <- nodes_query
subg <- dnet::dDAGinduce(g, nodes_query)
# color-code terms according to the adjust p-values (taking the form of 10-based negative logarithm)
dnet::visDAG(g=subg, data=-1*log10(eTerm$adjp[V(subg)$name]),
node.info="both", zlim=c(0,2), node.attrs=list(color=nodes.highlight))
# color-code terms according to the z-scores
dnet::visDAG(g=subg, data=eTerm$zscore[V(subg)$name], node.info="both",
colormap="darkblue-white-darkorange",
node.attrs=list(color=nodes.highlight))

Run the code above in your browser using DataLab