extr_chem_info: Query Chemical Information from IUPAC Names
Description
This function takes a vector of IUPAC names and queries the PubChem database
(using the webchem package) to obtain the corresponding CASRN and CID for
each compound. It reshapes the resulting data, ensuring that each compound
has a unique row with the CID, CASRN, and additional chemical properties.
A data frame with phisio-chemical information on the queried
compounds, including but not limited to:
iupac_name
The IUPAC name of the compound.
cid
The PubChem Compound Identifier (CID).
isomeric_smiles
The SMILES string (Simplified Molecular Input Line
Entry System).
Arguments
iupac_names
A character vector of IUPAC names. These are standardized
names of chemical compounds that will be used to search in the PubChem
database.
verbose
A logical value indicating whether to print detailed messages.
Default is TRUE.
domain
A character string specifying the PubChem domain to query.
One of "compound" or substance. Default is compound.
delay
A numeric value indicating the delay (in seconds) between API
requests. This controls the time between successive PubChem queries.
Default is 0. See Details for more info.
Details
The function performs two queries to PubChem:
The first query retrieves the PubChem Compound Identifier (CID) for each
IUPAC name.
The second query retrieves additional information using the
obtained CIDs.
In cases of multiple rapid successive requests, the PubChem server may
deny access. Introducing a delay between requests (using the delay
parameter) can help prevent this issue.