Learn R Programming

reutils (version 0.2.2)

reutils: An interface to the NCBI Entrez programming Utilities.

Description

reutils provides support for interacting with NCBI databases such as PubMed, Genbank, or GEO via the Entrez Programming Utilities (EUtils).

Please check the relevant usage guidelines when using these services. Note that Entrez server requests are subject to frequency limits.

Arguments

Main functions

  • esearch: Search and retrieve primary UIDs for use with esummary, elink, or efetch. esearch additionally returns term translations and optionally stores results for future use in the user's Web Environment.
  • esummary: Retrieve document summaries from a list of primary UIDs (Provided as a character vector or as an esearch object).
  • egquery: Provides Entrez database counts in XML for a single search term using a Global Query.
  • einfo: Retrieve field names, term counts, last update, and available updates for each database.
  • efetch: Retrieve data records in a specified format corresponding to a list of primary UIDs or from the user's Web Environment in the Entrez History server.
  • elink: Returns a list of UIDs (and relevancy scores) from a target database that are related to a list of UIDs in the same database or in another Entrez database.
  • epost: Uploads primary UIDs to the users's Web Environment on the Entrez history server for subsequent use with esummary, elink, or efetch.
  • espell: Provide spelling suggestions.
  • ecitmatch: Retrieves PubMed IDs (PMIDs) that correspond to a set of input citation strings
  • content: Extract the content of a request from the eutil object returned by any of the above functions.

Package options

reutils uses three options to configure behaviour:
  • reutils.email: NCBI requires that a user of their API provides an email address with a call to Entrez. If you are going to perform a lot of queries consider setting reutils.email to your email address in your .Rprofile file.
  • reutils.show.headlines: By default efetch objects containing text data show only the first 12 lines. This is quite handy if you have downloaded a fairly large genome in Genbank file format. This can be changed by setting the global option reutils.show.headlines to another numeric value or NULL.
  • reutils.verbose.queries: If you perform many queries interactively you might want to get messages announcing the queries you run. You can do so by setting the option reutils.verbose.queries to TRUE.
  • reutils.test.remote: Unit tests that require online access to NCBI services are disabled by default, as they cannot be garanteed to be available/working under all circumstances. Set the option codereutils.test.remote to TRUE to run the full suite of tests.

Details

With nine E-Utilities, NCBI provides a programmatical interface to the Entrez query and database system for searching and retrieving requested data

Each of these tools corresponds to an R function in the reutils package described below.

The output returned by the EUtils is typically in XML format. To gain access to this output you have several options:

  1. Use the content(as = "xml") method to extract the output as an XMLInternalDocument object and process it further using the facilities provided by the XML package.

  • Use the content(as = "parsed") method to extract the output into data.frames. Note that this is currently only implemented for docsums returned by esummary, uilists returned by esearch, and the output returned by einfo.
  • Access specific nodes in the XML tree using XPath expressions with the reference class methods #xmlValue, #xmlAttr, or #xmlName built into eutil objects.
  • The Entrez Programming Utilities can also generate output in other formats, such as plain-text Fasta or GenBank files for sequence databases, or the MedLine format for the literature database. The type of output is generally controlled by setting the retmode and rettype arguments when calling a EUtil.

    Examples

    Run this code
    #
    # combine esearch and efetch
    #
    # Download PubMed records that are indexed in MeSH for both 'Chlamydia' and 
    # 'genome' and were published in 2013.
    query <- "Chlamydia[mesh] and genome[mesh] and 2013[pdat]"
    
    # Upload the PMIDs for this search to the History server
    pmids <- esearch(query, "pubmed", usehistory = TRUE)
    pmids
    
    ## Not run: 
    # # Fetch the records
    # articles <- efetch(pmids)
    # 
    # # Use XPath expressions with the #xmlValue() or #xmlAttr() methods to directly
    # # extract specific data from the XML records stored in the 'efetch' object.
    # titles <- articles$xmlValue("//ArticleTitle")
    # abstracts <- articles$xmlValue("//AbstractText")
    # 
    # #
    # # combine epost with esummary/efetch
    # #
    # # Download protein records corresponding to a list of GI numbers.
    # uid <- c("194680922", "50978626", "28558982", "9507199", "6678417")
    # 
    # # post the GI numbers to the Entrez history server
    # p <- epost(uid, "protein")
    # 
    # # retrieve docsums with esummary
    # docsum <- content(esummary(p, version = "1.0"), "parsed")
    # docsum
    # 
    # # download FASTAs as 'text' with efetch
    # prot <- efetch(p, retmode = "text", rettype = "fasta")
    # prot
    # 
    # # retrieve the content from the efetch object
    # fasta <- content(prot)
    # ## End(Not run)
    

    Run the code above in your browser using DataLab