Learn R Programming

RxnSim (version 1.0.4)

rs.compute: Computes Similarity of Reactions

Description

Computes similarity between two (or more) input reactions.
rs.compute computes similarity of two reactions.
rs.compute.list computes similarity of two lists of reactions.
rs.compute.sim.matrix computes similarity of reactions in a list.
rs.compute.DB computes similarity of a reaction against a database (parsed from text file).

Usage

rs.compute (rxnA, rxnB, format = 'rsmi', standardize = TRUE, explicitH = FALSE, 
            reversible = TRUE, algo = 'msim', sim.method = 'tanimoto', 
            fp.type = 'extended', fp.mode = 'bit', fp.depth = 6, fp.size = 1024,
            verbose = FALSE, fpCached = FALSE)
rs.compute.list (rxnA, rxnB, format = 'rsmi', standardize = TRUE, explicitH = FALSE,
              reversible = TRUE, algo = 'msim', sim.method = 'tanimoto',
              fp.type = 'extended', fp.mode = 'bit',fp.depth = 6, fp.size = 1024,
              clearCache = TRUE)
rs.compute.sim.matrix (rxnA, format = 'rsmi', standardize = TRUE, explicitH = FALSE,
                    reversible = TRUE, algo = 'msim', sim.method = 'tanimoto', 
                    fp.type = 'extended', fp.mode = 'bit', fp.depth = 6, fp.size = 1024,
                    clearCache = TRUE)
rs.compute.DB (rxnA, DB, format = 'rsmi', ecrange = '*', reversible = TRUE,
              algo = 'msim', sim.method = 'tanimoto', sort = TRUE, fpCached = FALSE)

Value

rs.compute

returns a similarity value.

rs.compute.list

returns a \(m \times n\) matrix of similarity values. \(m\) and \(n\) are the length of two input lists respectively.

rs.compute.sim.matrix

returns a \(m \times m\) symmetric matrix of similarity values. \(m\) is the length of the input list.

rs.compute.DB

returns a data frame.

Arguments

rxnA

input reaction in RSMI format or name (with path) of MDL RXN file. rs.compute.list and rs.compute.sim.matrix accept list of reactions as input.

rxnB

input reaction in RSMI format or name (with path) of MDL RXN file. rs.compute.list accepts list of reactions as input.

DB

parsed database object as returned by rs.makeDB.

format

specifies format of input reaction(s). Reaction(s) can be provided in one of following formats: 'RSMI' (default) or 'RXN'.

ecrange

EC number(s) search pattern while comparing against reaction DB. * is used as wildcard. E.g., 1.2.1.* will restricted search to all reactions with EC numbers starting with 1.2.1.- .

standardize

suppresses all explicit hydrogen if set as TRUE (default).

explicitH

converts all implicit hydrogen to explicit if set as TRUE. It is set as FALSE by default.

reversible

boolean that indicates reversibility of input reaction(s). If set as TRUE (default), reaction(s) are aligned by comparing them in forward direction and by reversing one of them to compute maximum similarity value.

algo

reaction similarity algorithm to be used. One of following algorithms can be used: 'msim' (default), 'msim_max', 'rsim' and 'rsim2'. See description for the details of the algorithms.

sim.method

similarity metric to be used to evaluate reaction similarity. Allowed types include:
'simple', 'jaccard', 'tanimoto' (default), 'russelrao', 'dice', 'rodgerstanimoto', 'achiai', 'cosine', 'kulczynski2', 'mt', 'baroniurbanibuser', 'tversky', 'robust', 'hamann', 'pearson', 'yule', 'mcconnaughey', 'simpson', 'jaccard-count' and 'tanimoto-count'.

fp.type

fingerprint type to use. Allowed types include:
'standard', 'extended' (default), 'graph', 'estate', 'hybridization', 'maccs', 'pubchem', 'kr', 'shortestpath', 'signature' and 'circular'.

fp.mode

fingerprint mode to be used. It can either be set to 'bit' (default) or 'count'.

fp.depth

search depth for fingerprint construction. This argument is ignored for 'pubchem', 'maccs', 'kr' and 'estate' fingerprints.

fp.size

length of the fingerprint bit string. This argument is ignored for the 'pubchem', 'maccs', 'kr', 'estate', 'circular' (count mode) and 'signature' fingerprints.

verbose

boolean that enables display of detailed molecule pairing and reaction alignment (and respective similarity values). The argument is ignored for 'rsim2' algorithm.

sort

boolean than enables rs.compute.DB to return data frame sorted based upon decreasing value of similarities.

fpCached

boolean that enables fingerprint caching. It is set to FALSE by default.

clearCache

boolean that resets the cache before (and after) processing reaction lists. It is set to TRUE by default. Cache can also be explicitly cleared using rs.clearCache.

Author

Varun Giri varungiri@gmail.com

Details

RxnSim implements four algorithms to compute reaction similarity, namely msim, msim_max, rsim and rsim2.

msim

is based on individual similarities of molecules in two reactions. First, each reactant (product) of a reaction is paired with an equivalent (similar) reactant (product) of the other reaction based on pairwise similarity values using hierarchical grouping. A 0 similarity value is assigned to each unpaired molecule. Reaction similarity is then computed by averaging the similarity values for each pair of equivalent molecule(s) and unpaired molecule(s). Molecule equivalences computed can be reviewed using verbose mode in rs.compute.

msim_max

reaction similarity is computed in the same way as described for msim except that the unpaired molecules are not used for computing average.

rsim

is based on cumulative features of reactant(s) and product(s) of two reactions. Each reaction is represented by two fingerprints, one each for the reactants and another for products. Reaction similarity is computed by averaging similarity values obtained by comparing reactants fingerprint and products fingerprints.

rsim2

is based on cumulative features of all molecules in a reaction forming a reaction fingerprint. Reaction similarity is computed based on the reaction fingerprints of two reactions.

For reversible reactions (reversible = TRUE), apart from comparing reactions in the forward direction they are also compared by reversing one of the reactions. The greater of the two similarity values is reported.

Fingerprint Caching
rs.compute and rs.compute.DB functions can use fingerprint caching. If fpCached is set as TRUE, cache is queried first before generating fingerprints. Any new fingerprint generated is stored in the cache. Setting fpCached = FALSE makes no change to cache. Cache can be cleared by calling rs.clearCache.

rs.compute.list and rs.compute.sim.matrix functions internally use caching. To ensure consistency of fingerprints, rs.clearCache is called internally. Use clearCache = FALSE to override this behaviour; it will use current state of cache and add new fingerprints to it.

Same cache is used for all functions.

Similarity metric included in RxnSim. These metric (except jaccard-count and tanimoto-count) are derived from fingerprint pacakge.

IDNameRemarks
simpleSokal & Michenerbit
jaccardJaccardbit
tanimotoTanimoto (bit)bit and count
jaccard-countJaccard (count)count
tanimoto-countTanimoto (count)count ^
diceDice (bit)bit and count
russelraoRussel And Raobit
rodgerstanimotoRoger And Tanimotobit
achiaiOchiaibit
cosineCosinebit
kulczynski2Kulczynski 2bit
mtModified Tanimotobit
baroniurbanibuserBaroni-Urbani/Buserbit
robustRobust (bit)bit and count
tverskyTversky*bit
hamannHamannbit
pearsonPearsonbit
yuleYulebit
mcconnaugheyMcConnaugheybit
simpsonSimpsonbit

*Tversky coefficients can be specified by combining them into a vector, e.g., c('tversky', a, b).

tanimoto (bit), dice (bit) and robust (bit) compute similarity of feature vectors (count mode) by translating them to equivalent fingerprint vectors. Default similarity metric used is tanimoto.

List of fingerprints included in RxnSim. These are derived from rCDK package.

IDName of the FingerprintMode
standardStandardbit
extendedExtendedbit
estateEStatebit
graphGraphonlybit
hybridizationHybridizationbit
maccsMACCSbit
pubchemPubchembit
krKlekota-Rothbit
shortestpathShortestpathbit
signatureSignaturecount
circularCircularbit and count

References

^ Carbonell, P., Planson, A-G., Fichera, D., & Faulon J-L. (2011) A retrosynthetic biology approach to metabolic pathway design for therapeutic production. BMC Systems Biology, 5:122.

See Also

rs.makeDB, rs.clearCache, ms.compute

Examples

Run this code
# \dontshow{
rct1 <- 'C(=O)C1(=CC=CC=C1).O=O.[OH2]>>[H+].C(C1(C=CC=CC=1))([O-])=O.OO'
rct2 <- 'C(C=CC1(=CC=CC=C1))=O.O=O.[OH2]>>[H+].C(=O)([O-])C=CC1(=CC=CC=C1).OO'
# }
# Reaction similarity using msim algorithm
rs.compute(rct1, rct2, verbose = TRUE)

Run the code above in your browser using DataLab