- pc.directory
directory from which to load pubchem .Rdata files. alternatively, provide R data.tables for ALL cid.property.object options defined below.
- use.bio.sources
logical. If TRUE (default) use the bio.source vector of sources, incorporating all CIDs from those bio databases.
- bio.sources
vector of source names from which to extract pubchem CIDs. all can be found here: https://pubchem.ncbi.nlm.nih.gov/sources/, but can additionally use "PubChemLite" as a datasource. defaults to c("Metabolomics Workbench", "Human Metabolome Database (HMDB)", "ChEBI", "LIPID MAPS", "MassBank of North America (MoNA)")
- use.pathways
logical. should all CIDs from any biological pathway data be incorporated into database?
- pathway.sources
character. vector of sources to be used when adding metabolites to pubchem bio database. default = NULL, using all pathway sources.
- use.taxid
logical. should all CIDs associated with a taxonomic identifier (taxid) be used?
- taxonomy.sources
character. vector of sources to be used when adding taxonomically related metabolites to database. Default = NULL, using all sources.
- use.parent.cid
logical. should CIDs be replaced with parent CIDs? default = TRUE.
- use.parent.when.charged
logical. default = FALSE. If TRUE, and use.parent.cid is TRUE, the parent will always be chosen. if use.parent.when.charged = FALSE, and use.parent.cid = TRUE, the neutral molecule will be used, even if that is the child molecule. See CID 1 and CID 2, for an example.
- remove.salts
logical. should salts be removed from dataset? default = TRUE. salts recognized as '.' in smiles string. performed after 'use.parent.cid'.
- remove.inorganics
logical. should inorganic molecules (those with no carbon) be removed? default = FALSE.
- mw.range
vector. numerical vector of length = 2. default = c(50, 2000).
- get.properties
logical. if TRUE, will return rcdk calculated properties: XLogP, TPSA, HBondDonorCount and HBondAcceptorCount.
- threads
integer. how many threads to use when calculating rcdk properties. parallel processing via DoParallel and foreach packages.
- rcdk.desc
vector. character vector of valid rcdk descriptors. default = rcdk.desc <- c("org.openscience.cdk.qsar.descriptors.molecular.XLogPDescriptor", "org.openscience.cdk.qsar.descriptors.molecular.AcidicGroupCountDescriptor", "org.openscience.cdk.qsar.descriptors.molecular.BasicGroupCountDescriptor", "org.openscience.cdk.qsar.descriptors.molecular.TPSADescriptor"). To see descriptor categories: 'dc <- rcdk::get.desc.categories(); dc' . To see the descriptors within one category: 'dn <- rcdk::get.desc.names(dc[4]); dn'. Note that the four default parameters are relatively fast to calculate - some descriptors take a very long time to calculate. you can calculate as many as you wish, but processing time will increase the more descriptors are added.
- cid.lca.object
R data.table, generally produced by build.cid.lca; preferably, define pc.directory
- cid.sid.object
R data.table, generally produced by get.pubchem.ftp; preferably, define pc.directory
- cid.pwid.object
R data.table, generally produced by get.pubchem.ftp; preferably, define pc.directory
- cid.parent.object
R data.table, generally produced by get.pubchem.ftp; preferably, define pc.directory
- cid.taxid.object
R data.table, generally produced by get.pubchem.ftp; preferably, define pc.directory
- cid.formula.object
R data.table, generally produced by get.pubchem.ftp; preferably, define pc.directory
- cid.smiles.object
R data.table, generally produced by get.pubchem.ftp; preferably, define pc.directory
- cid.inchikey.object
R data.table, generally produced by get.pubchem.ftp; preferably, define pc.directory
- cid.inchi.object
R data.table, generally produced by get.pubchem.ftp; preferably, define pc.directory
- cid.monoisotopic.mass.object
R data.table, generally produced by get.pubchem.ftp; preferably, define pc.directory
- cid.title.object
R data.table, generally produced by get.pubchem.ftp; preferably, define pc.directory
- cid.cas.object
R data.table, generally produced by get.pubchem.ftp; preferably, define pc.directory
- cid.pmid.ct.object
R data.table, generally produced by get.pubchem.ftp; preferably, define pc.directory
- output.directory
directory to which the pubchem.bio database is saved. If NULL, will try to save in pc.directory (if provided), else not saved.