scrapenames: Resolve names using Global Names Recognition and Discovery.

Description

Uses the Global Names Recognition and Discovery service, see http://gnrd.globalnames.org/.

Note: this function somestimes gives data back and sometimes not. The API that this function is extremely buggy.

Usage

scrapenames(url = NULL, file = NULL, text = NULL, engine = NULL, unique = NULL, verbatim = NULL, detect_language = NULL, all_data_sources = NULL, data_source_ids = NULL, ...)

Arguments

url

An encoded URL for a web page, PDF, Microsoft Office document, or image file, see examples

file

When using multipart/form-data as the content-type, a file may be sent. This should be a path to your file on your machine.

text

Type: string. Text content; best used with a POST request, see examples

engine

(optional) (integer) Default: 0. Either 1 for TaxonFinder, 2 for NetiNeti, or 0 for both. If absent, both engines are used.

unique

(optional) (logical) If TRUE (default), response has unique names without offsets.

verbatim

(optional) Type: boolean, If TRUE (default to FALSE), response excludes verbatim strings.

detect_language

(optional) Type: boolean, When TRUE (default), NetiNeti is not used if the language of incoming text is determined not to be English. When FALSE, NetiNeti will be used if requested.

all_data_sources

(optional) Type: boolean. Resolve found names against all available Data Sources.

data_source_ids

(optional) Type: string. Pipe separated list of data source ids to resolve found names against. See list of Data Sources http://resolver.globalnames.org/data_sources.

...

Further args passed to GET

Value

A list of length two, first is metadata, second is the data as a data.frame.

Details

One of url, file, or text must be specified - and only one of them.

Examples

Run this code

## Not run: 
# # Get data from a website using its URL
# scrapenames(url = 'http://en.wikipedia.org/wiki/Araneae')
# scrapenames(url = 'http://en.wikipedia.org/wiki/Animalia')
# scrapenames(url = 'http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0095068')
# scrapenames(url = 'http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0080498')
# scrapenames(url = 'http://ucjeps.berkeley.edu/cgi-bin/get_JM_treatment.pl?CARYOPHYLLACEAE')
# 
# # Scrape names from a pdf at a URL
# url <- 'http://www.plosone.org/article/fetchObject.action?uri=
# info%3Adoi%2F10.1371%2Fjournal.pone.0058268&representation=PDF'
# scrapenames(url = sub('\n', '', url))
# 
# # With arguments
# scrapenames(url = 'http://www.mapress.com/zootaxa/2012/f/z03372p265f.pdf', unique=TRUE)
# scrapenames(url = 'http://en.wikipedia.org/wiki/Araneae', data_source_ids=c(1, 169))
# 
# # Get data from a file
# speciesfile <- system.file("examples", "species.txt", package = "taxize")
# scrapenames(file = speciesfile)
# 
# nms <- paste0(names_list("species"), collapse="\n")
# file <- tempfile(fileext = ".txt")
# writeLines(nms, file)
# scrapenames(file = file)
# 
# # Get data from text string
# scrapenames(text='A spider named Pardosa moesta Banks, 1892')
# 
# # use curl options
# library("httr")
# scrapenames(text='A spider named Pardosa moesta Banks, 1892', config = verbose())
# ## End(Not run)

Run the code above in your browser using DataLab