dcDAGdomainSim: Function to calculate pair-wise semantic similarity between domains based on a direct acyclic graph (DAG) with annotated data

Description

dcDAGdomainSim is supposed to calculate pair-wise semantic similarity between domains based on a direct acyclic graph (DAG) with annotated data. It first calculates semantic similarity between terms and then derives semantic similarity between domains from terms-term semantic similarity. Parallel computing is also supported for Linux or Mac operating systems.

Usage

dcDAGdomainSim(g, domains = NULL, method.domain = c("BM.average",
"BM.max",
"BM.complete", "average", "max"), method.term = c("Resnik", "Lin",
"Schlicker", "Jiang", "Pesquita"), force = TRUE, fast = TRUE,
parallel = TRUE, multicores = NULL, verbose = TRUE)

Arguments

an object of class "igraph" or Onto. It must contain a node attribute called 'annotations' for storing annotation data (see example for howto)

domains

the domains between which pair-wise semantic similarity is calculated. If NULL, all domains annotatable in the input dag will be used for calcluation, which is very prohibitively expensive!

method.domain

the method used for how to derive semantic similarity between domains from semantic similarity between terms. It can be "average" for average similarity between any two terms (one from domain 1, the other from domain 2), "max" for the maximum similarity b

method.term

the method used to measure semantic similarity between terms. It can be "Resnik" for information content (IC) of most informative information ancestor (MICA) (see http://arxiv.org/pdf/cmp-lg/9511007.pdf), "Lin" for 2*IC at MICA divided by the s

force

logical to indicate whether the only most specific terms (for each domain) will be used. By default, it sets to true. It is always advisable to use this since it is computationally fast but without compromising accuracy (considering the fact that true-pat

fast

logical to indicate whether a vectorised fast computation is used. By default, it sets to true. It is always advisable to use this vectorised fast computation; since the conventional computation is just used for understanding scripts

parallel

logical to indicate whether parallel computation with multicores is used. By default, it sets to true, but not necessarily does so. Partly because parallel backends available will be system-specific (now only Linux or Mac OS). Also, it will depend on whet

multicores

an integer to specify how many cores will be registered as the multicore parallel backend to the 'foreach' package. If NULL, it will use a half of cores available in a user's computer. This option only works when parallel computation is enabled

verbose

logical to indicate whether the messages will be displayed in the screen. By default, it sets to true for display

Value

an object of S4 class Dnetwork. It is a weighted and undirect graph, with following slots:
- nodeInfo: an object of S4 class, describing information on nodes/domains
adjMatrix: an object of S4 class AdjData, containing symmetric adjacency data matrix for pair-wise semantic similarity between domains

Examples

Run this code

# 1) load onto.GOMF (as 'Onto' object)
g <- dcRDataLoader('onto.GOMF')

# 2) load SCOP superfamilies annotated by GOMF (as 'Anno' object)
Anno <- dcRDataLoader('SCOP.sf2GOMF')

# 3) prepare for ontology appended with annotation information
dag <- dcDAGannotate(g, annotations=Anno, path.mode="shortest_paths",
verbose=FALSE)

# 4) calculate pair-wise semantic similarity between 6 randomly chosen domains
alldomains <- unique(unlist(nInfo(dag)$annotations))
domains <- sample(alldomains,6)
dnetwork <- dcDAGdomainSim(g=dag, domains=domains,
method.domain="BM.average", method.term="Resnik", parallel=FALSE,
verbose=TRUE)
dnetwork

# 5) convert it to an object of class 'igraph'
ig <- dcConverter(dnetwork, from='Dnetwork', to='igraph')
ig

# 6) visualise the domain network
## extract edge weight (with 2-digit precision)
x <- signif(E(ig)$weight, digits=2)
## rescale into an interval [1,4] as edge width
edge.width <- 1 + (x-min(x))/(max(x)-min(x))*3
## do visualisation
dnet::visNet(g=ig, vertex.shape="sphere", edge.width=edge.width,
edge.label=x, edge.label.cex=0.7)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples