reutils provides support for interacting with NCBI databases such as PubMed,
Genbank, or GEO via the Entrez Programming Utilities
(Please check the relevant
Each of these tools corresponds to an R function in the reutils
package described below.
The output returned by the EUtils is typically in XML format. To gain access to this output you have several options:
content(as = "xml")method to extract the output as anXMLInternalDocumentobject and process it further using the
facilities provided by theXMLpackage.content(as = "parsed")method to extract the output
intodata.frames. Note that this is currently only implemented fordocsums returned byesummary,uilists returned
byesearch, and the output returned byeinfo.#xmlValue}, \code{#xmlAttr}, or \code{#xmlName}
built into eutil objects.The Entrez Programming Utilities can also generate output in other formats,
such as plain-text Fasta or GenBank files for sequence databases,
or the MedLine format for the literature database. The type of output is
generally controlled by setting theretmodeandrettypearguments
when calling a EUtil.esearch: Search and retrieve primary UIDs for use
withesummary,elink, orefetch.esearchadditionally returns term translations and optionally
stores results for future use in the user's Web Environment.esummary: Retrieve document summaries from
a list of primary UIDs (Provided as a character vector or as anesearchobject).egquery: Provides Entrez database counts in XML
for a single search term using a Global Query.einfo: Retrieve field names, term counts, last
update, and available updates for each database.efetch: Retrieve data records in a specified
format corresponding to a list of primary UIDs or from the user's Web
Environment in the Entrez History server.elink: Returns a list of UIDs (and relevancy
scores) from a target database that are related to a list of UIDs in
the same database or in another Entrez database.epost: Uploads primary UIDs to the users's Web
Environment on the Entrez history server for subsequent use withesummary,elink, orefetch.espell: Provide spelling suggestions.ecitmatch: Retrieves PubMed IDs (PMIDs) that
correspond to a set of input citation stringscontent: Extract the content of a request from theeutil object returned by any of the above functions.
reutils uses three options to configure behaviour:
reutils.email: NCBI requires that a user of their API provides an
email address with a call to Entrez. If you are going to perform a lot
of queries consider settingreutils.emailto your email address in
your .Rprofile file.reutils.show.headlines: By defaultefetch objects containing text data show only the first 12 lines. This is quite handy
if you have downloaded a fairly large genome in Genbank file format. This
can be changed by setting the global optionreutils.show.headlinesto
another numeric value orNULL.reutils.verbose.queries: If you perform many queries interactively
you might want to get messages announcing the queries you run. You can do so by setting
the optionreutils.verbose.queriestoTRUE.reutils.test.remote: Unit tests that require online access to NCBI
services are disabled by default, as they cannot be garanteed to be
available/working under all circumstances. Set the option
code{reutils.test.remote} toTRUEto run the full suite of tests.# Upload the PMIDs for this search to the History server pmids <- esearch(query, "pubmed", usehistory = TRUE) pmids
# Fetch the records articles <- efetch(pmids)
# Use XPath expressions with the #xmlValue() or #xmlAttr() methods to directly # extract specific data from the XML records stored in the 'efetch' object. titles <- articles$xmlValue("//ArticleTitle") abstracts <- articles$xmlValue("//AbstractText")
# # combine epost with esummary/efetch # # Download protein records corresponding to a list of GI numbers. uid <- c("194680922", "50978626", "28558982", "9507199", "6678417")
# post the GI numbers to the Entrez history server p <- epost(uid, "protein")
# retrieve docsums with esummary docsum <- content(esummary(p, version = "1.0"), "parsed") docsum
# download FASTAs as 'text' with efetch prot <- efetch(p, retmode = "text", rettype = "fasta") prot
# retrieve the content from the efetch object
fasta <- content(prot)
[object Object]