Learn R Programming

reutils (version 0.1.1)

efetch: efetch - downloading full records

Description

efetch performs calls to the NCBI EFetch utility to retrieve data records in the requested format for an NCBI Accession Number, one or more primary UIDs, or for a set of UIDs stored in the user's web environment.

Usage

efetch(uid, db = NULL, rettype = NULL, retmode = NULL, retstart = NULL,
  retmax = NULL, querykey = NULL, webenv = NULL, strand = NULL,
  seqstart = NULL, seqstop = NULL, complexity = NULL)

Arguments

uid
(Required) A list of UIDs provided either as a character vector, as an esearch object, or by reference to a web environment and a query key obtained directly from previous calls to esearch
db
(Required if uid is a character vector of UIDs) Database from which to retrieve records. See http://www.ncbi.nlm.nih.gov/books/NBK25497/table/chapter2.chapter2_table1/?report=objectonly{here} for the supported databases.
rettype
A character string specifying the retrieval type, such as 'abstract' or 'medline' for PubMed, 'gp' or 'fasta' for Protein, or 'gb', or 'fasta' for Nuccore. See http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.chapter4_table1/?report
retmode
A character string specifying the data mode of the records returned, such as 'text' or 'xml'. See http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.chapter4_table1/?report=objectonly{here} for the available values for each dat
retstart
Numeric index of the first record to be retrieved.
retmax
Total number of records from the input set to be retrieved.
querykey
An integer specifying which of the UID lists attached to a user's Web Environment will be used as input to efetch. (Usually obtained drectely from objects returned by a previous call to esearch
webenv
A character string specifying the Web Environment that contains the UID list. (Usually obtained directely from objects returned by a previous call to esearch, epost
strand
Strand of DNA to retrieve. (1: plus strand, 2: minus strand)
seqstart
First sequence base to retrieve.
seqstop
Last sequence base to retrieve.
complexity
Data content to return. (0: entire data structure, 1: bioseq, 2: minimal bioseq-set, 3: minimal nuc-prot, 4: minimal pub-set)

Value

  • An efetch object.

Details

See the official online documentation for NCBI's http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch{EUtilities} for additional information.

See http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.chapter4_table1/?report=objectonly{here} for the default values for rettype and rretmode, as well as a list of the available databases for the EFetch utility.

See Also

content, getUrl, getError, database, retmode, rettype.

Examples

Run this code
## From Protein, retrieve a raw GenPept record and write it to a file.
p <- efetch("195055", "protein", "gp")

write(content(p, "text"), file="~/AAD15290.gp")

## Get accessions for a list of GenBank IDs (GIs)
acc <- efetch(c("1621261", "89318838", "68536103", "20807972", "730439"), "protein",
              rettype="acc")
acc
acc <- strsplit(content(acc), "\n")[[1]]
acc

## Get GIs from a list of accession numbers
gi <- efetch(c("CAB02640.1", "EAS10332.1", "YP_250808.1", "NP_623143.1", "P41007.1"),
              "protein", "uilist")
gi

## we can conveniently extract the UIDs using the eutil method #xmlValue(xpath)
gi$xmlValue("/IdList/Id")

## or we can extract the contents of the efetch query using the fuction content()
## and use the XML package to retrieve the UIDs
doc <- content(gi)
XML::xpathSApply(doc, "/IdList/Id", XML::xmlValue)

## Get the scientific name for an organism starting with the NCBI taxon id.
tx <- efetch("527031", "taxonomy")
tx

## Convenience accessor for XML nodes of interest using XPath
## Extract the TaxIds of the Lineage
tx["//LineageEx/Taxon/TaxId"]

## Use an XPath expession to extract the scientific name.
tx$xmlValue("/TaxaSet/Taxon/ScientificName")

Run the code above in your browser using DataLab