This function interacts with the CompTox Chemistry Dashboard to download and
extract a wide range of chemical data based on user-defined search criteria.
It allows for flexible input types and supports downloading various chemical
properties, identifiers, and predictive data. It was inspired by the
ECOTOXr::websearch_comptox function.
extr_comptox(
ids,
download_items = c("CASRN", "INCHIKEY", "IUPAC_NAME", "SMILES", "INCHI_STRING",
"MS_READY_SMILES", "QSAR_READY_SMILES", "MOLECULAR_FORMULA", "AVERAGE_MASS",
"MONOISOTOPIC_MASS", "QC_LEVEL", "SAFETY_DATA", "EXPOCAST", "DATA_SOURCES",
"TOXVAL_DATA", "NUMBER_OF_PUBMED_ARTICLES", "PUBCHEM_DATA_SOURCES", "CPDAT_COUNT",
"IRIS_LINK", "PPRTV_LINK", "WIKIPEDIA_ARTICLE", "QC_NOTES", "ABSTRACT_SHIFTER",
"TOXPRINT_FINGERPRINT", "ACTOR_REPORT", "SYNONYM_IDENTIFIER", "RELATED_RELATIONSHIP",
"ASSOCIATED_TOXCAST_ASSAYS", "TOXVAL_DETAILS",
"CHEMICAL_PROPERTIES_DETAILS",
"BIOCONCENTRATION_FACTOR_TEST_PRED", "BOILING_POINT_DEGC_TEST_PRED",
"48HR_DAPHNIA_LC50_MOL/L_TEST_PRED", "DENSITY_G/CM^3_TEST_PRED", "DEVTOX_TEST_PRED",
"96HR_FATHEAD_MINNOW_MOL/L_TEST_PRED", "FLASH_POINT_DEGC_TEST_PRED",
"MELTING_POINT_DEGC_TEST_PRED", "AMES_MUTAGENICITY_TEST_PRED",
"ORAL_RAT_LD50_MOL/KG_TEST_PRED", "SURFACE_TENSION_DYN/CM_TEST_PRED",
"THERMAL_CONDUCTIVITY_MW/(M*K)_TEST_PRED",
"TETRAHYMENA_PYRIFORMIS_IGC50_MOL/L_TEST_PRED", "VISCOSITY_CP_CP_TEST_PRED",
"VAPOR_PRESSURE_MMHG_TEST_PRED", "WATER_SOLUBILITY_MOL/L_TEST_PRED",
"ATMOSPHERIC_HYDROXYLATION_RATE_(AOH)_CM3/MOLECULE*SEC_OPERA_PRED",
"BIOCONCENTRATION_FACTOR_OPERA_PRED",
"BIODEGRADATION_HALF_LIFE_DAYS_DAYS_OPERA_PRED", "BOILING_POINT_DEGC_OPERA_PRED",
"HENRYS_LAW_ATM-M3/MOLE_OPERA_PRED", "OPERA_KM_DAYS_OPERA_PRED",
"OCTANOL_AIR_PARTITION_COEFF_LOGKOA_OPERA_PRED",
"SOIL_ADSORPTION_COEFFICIENT_KOC_L/KG_OPERA_PRED",
"OCTANOL_WATER_PARTITION_LOGP_OPERA_PRED", "MELTING_POINT_DEGC_OPERA_PRED",
"OPERA_PKAA_OPERA_PRED", "OPERA_PKAB_OPERA_PRED", "VAPOR_PRESSURE_MMHG_OPERA_PRED",
"WATER_SOLUBILITY_MOL/L_OPERA_PRED",
"EXPOCAST_MEDIAN_EXPOSURE_PREDICTION_MG/KG-BW/DAY", "NHANES",
"TOXCAST_NUMBER_OF_ASSAYS/TOTAL", "TOXCAST_PERCENT_ACTIVE"),
mass_error = 0,
verify_ssl = FALSE,
verbose = TRUE,
delay = 7,
...
)A cleaned data frame containing the requested data from CompTox.
A character vector containing the items to be searched within the CompTox Chemistry Dashboard. These can be chemical names, CAS Registry Numbers (CASRN), InChIKeys, or DSSTox substance identifiers (DTXSID).
A character vector of items to be downloaded. This includes a comprehensive set of chemical properties, identifiers, predictive data, and other relevant information. By Default, it downloads all the info.
The Chemical Abstracts Service Registry Number, a unique numerical identifier for chemical substances.
The hashed version of the full International Chemical Identifier (InChI) string.
The International Union of Pure and Applied Chemistry (IUPAC) name of the chemical.
The Simplified Molecular Input Line Entry System (SMILES) representation of the chemical structure.
The full International Chemical Identifier (InChI) string.
The SMILES representation of the chemical structure, prepared for mass spectrometry analysis.
The SMILES representation of the chemical structure, prepared for quantitative structure-activity relationship (QSAR) modeling.
The chemical formula representing the number and type of atoms in a molecule.
The average mass of the molecule, calculated based on the isotopic distribution of the elements.
The mass of the molecule calculated using the most abundant isotope of each element.
The quality control level of the data.
Safety information related to the chemical.
Exposure predictions from the EPA's ExpoCast program.
Sources of the data provided.
Toxicological values related to the chemical.
The number of articles related to the chemical in PubMed.
Sources of data from PubChem.
The number of entries in the Chemical and Product Categories Database (CPDat).
Link to the EPA's Integrated Risk Information System (IRIS) entry for the chemical.
Link to the EPA's Provisional Peer-Reviewed Toxicity Values (PPRTV) entry for the chemical.
Link to the Wikipedia article for the chemical.
Notes related to the quality control of the data.
Information related to the abstract shifter.
The ToxPrint chemoinformatics fingerprint of the chemical.
The Aggregated Computational Toxicology Resource (ACTOR) report for the chemical.
Identifiers for synonyms of the chemical.
Information on related chemicals.
Assays associated with the chemical in the ToxCast database.
Details of toxicological values.
Details of the chemical properties.
Predicted bioconcentration factor from tests.
Predicted boiling point in degrees Celsius from tests.
Predicted 48-hour LC50 for Daphnia in mol/L from tests.
Predicted density in g/cm³ from tests.
Predicted developmental toxicity from tests.
Predicted 96-hour LC50 for fathead minnow in mol/L from tests.
Predicted flash point in degrees Celsius from tests.
Predicted melting point in degrees Celsius from tests.
Predicted Ames mutagenicity from tests.
Predicted oral LD50 for rats in mol/kg from tests.
Predicted surface tension in dyn/cm from tests.
Predicted thermal conductivity in mW/m×K from tests.
Predicted IGC50 for Tetrahymena pyriformis in mol/L from tests.
Predicted viscosity in cP from tests.
Predicted vapor pressure in mmHg from tests.
Predicted water solubility in mol/L from tests.
Predicted # nolint atmospheric hydroxylation rate in cm³/molecule\*sec from OPERA.
Predicted bioconcentration factor from OPERA.
Predicted biodegradation # nolint half-life in days from OPERA.
Predicted boiling point in degrees Celsius from OPERA.
Predicted Henry's law constant in atm-m³/mole from OPERA.
Predicted Km in days from OPERA.
Predicted octanol-air partition coefficient (log Koa) from OPERA.
Predicted soil adsorption coefficient (Koc) in L/kg from OPERA.
Predicted octanol-water partition coefficient (log P) from OPERA.
Predicted melting point in degrees Celsius from OPERA.
Predicted pKa (acidic) from OPERA.
Predicted pKa (basic) from OPERA.
Predicted vapor pressure in mmHg from OPERA.
Predicted water solubility in mol/L # nolint from OPERA.
Predicted median exposure from ExpoCast in mg/kg-bw/day.
National Health and Nutrition Examination Survey data.
Number of assays in ToxCast.
Percentage of active assays in ToxCast.
Numeric value indicating the mass error tolerance for
searches involving mass data. Default is 0. Not used if libcurl depends
on OpenSSL.
Logical value indicating whether SSL certificates should be
verified. Default is FALSE. Not used if libcurl depends on OpenSSL.
A logical value indicating whether to print detailed messages. Default is TRUE.
Number of seconds to delay between the initial request and the subsequent request to download the Excel file.
Additional arguments passed to httr2::req_options(). Not used if
libcurl depends on OpenSSL.
This function is designed to handle potential connection issues with
EPA servers on Linux systems. These servers may not support modern security
protocols (unsafe legacy renegotiation), causing errors with newer versions
of libcurl when linked with OpenSSL.
To ensure reliability, the function automatically detects if your system's
libcurl is likely to be affected. If so, it uses the {condathis}
package to download and run the request with a known-compatible version of
curl (7.78.0).
# \donttest{
# Example usage of the function:
extr_comptox(ids = c("Aspirin", "50-00-0"))
# }
Run the code above in your browser using DataLab