Learn R Programming

pubchem.bio (version 1.0.5)

build.taxon.metabolome: build.taxon.metabolome

Description

utilizes downloaded and properly formatted local pubchem data created by 'get.pubchem.ftp' function to filter a dataset created by 'build.pubchem.bio' function

Usage

build.taxon.metabolome(
  pc.directory = NULL,
  taxid = c(),
  get.properties = FALSE,
  full.scored = TRUE,
  keep.scored.only = FALSE,
  aggregation.function = max,
  threads = 8,
  db.name = "custom.metabolome",
  rcdk.desc = c("org.openscience.cdk.qsar.descriptors.molecular.XLogPDescriptor",
    "org.openscience.cdk.qsar.descriptors.molecular.AcidicGroupCountDescriptor",
    "org.openscience.cdk.qsar.descriptors.molecular.BasicGroupCountDescriptor",
    "org.openscience.cdk.qsar.descriptors.molecular.TPSADescriptor"),
  pubchem.bio.object = NULL,
  cid.lca.object = NULL,
  taxid.hierarchy.object = NULL,
  output.directory = NULL
)

Value

a data frame containing pubchem CID ('cid'), and lowest common ancestor ('lca') NCBI taxonomy ID integer. will also save to pc.directory as .Rdata file.

Arguments

pc.directory

directory from which to load pubchem .Rdata files

taxid

integer vector of integer NCBI taxonomy IDs. i.e. c(9606, 1425170 ) for Homo sapiens and Homo heidelbergensis.

get.properties

logical. if TRUE, will return rcdk calculated properties: XLogP, TPSA, HBondDonorCount and HBondAcceptorCount.

full.scored

logincal. default = FALSE. When false, only metabolites which map to the taxid(s) are returned. When TRUE, all metabolites are returned, with scores assigned based on the distance of non-mapped metabolites to the root node. i.e. specialized metabolites from distantly related species are going to be scored at or near zero, specialized metabolites of mores similar species higher, and more conserved metabolites will score higher than ore specialized.

keep.scored.only

logical. If TRUE, biological metabolites with NA for the taxonomy score are removed before returning.

aggregation.function

function. default = max. can use mean, median, min, etc, or a custom function. Defines how the aggregate score will be calculated when multiple taxids are used.

threads

integer. how many threads to use when calculating rcdk properties. parallel processing via DoParallel and foreach packages.

db.name

character. what do you wish the file name for the saved version of this database to be? default = 'custom.metabolome', but could be 'taxid.4071' or 'Streptomyces', etc. Saved as an .Rdata file in the 'pc.directory' location.

rcdk.desc

vector. character vector of valid rcdk descriptors. default = rcdk.desc <- c("org.openscience.cdk.qsar.descriptors.molecular.XLogPDescriptor", "org.openscience.cdk.qsar.descriptors.molecular.AcidicGroupCountDescriptor", "org.openscience.cdk.qsar.descriptors.molecular.BasicGroupCountDescriptor", "org.openscience.cdk.qsar.descriptors.molecular.TPSADescriptor"). To see descriptor categories: 'dc <- rcdk::get.desc.categories(); dc' . To see the descriptors within one category: 'dn <- rcdk::get.desc.names(dc[4]); dn'. Note that the four default parameters are relatively fast to calculate - some descriptors take a very long time to calculate. you can calculate as many as you wish, but processing time will increase the more descriptors are added.

pubchem.bio.object

R data.table, generally produced by build.pubchem.bio; preferably, define pc.directory

cid.lca.object

R data.table, generally produced by build.cid.lca; preferably, define pc.directory

taxid.hierarchy.object

R data.table, generally produced by get.pubchem.ftp; preferably, define pc.directory

output.directory

directory to which the pubchem.bio database is saved. If NULL, will try to save in pc.directory (if provided), else not saved.

Author

Corey Broeckling

Details

utilizes downloaded and properly formatted local pubchem data created by 'get.pubchem.ftp' function

Examples

Run this code
data('cid.lca', package = "pubchem.bio")
data('pubchem.bio', package = "pubchem.bio")
data('taxid.hierarchy', package = "pubchem.bio")
my.taxon.db <- build.taxon.metabolome(
pubchem.bio.object = pubchem.bio,
cid.lca.object = cid.lca, taxid.hierarchy.object = taxid.hierarchy,
get.properties = FALSE, threads = 1, taxid = c(1))
head(my.taxon.db)

Run the code above in your browser using DataLab