Learn R Programming

rgbif (version 0.6.2)

occ_search: Search for GBIF occurrences.

Description

Note that you can pass in a vector to one of taxonkey, datasetKey, and catalogNumber parameters in a function call, but not a vector >1 of the three parameters at the same time

Hierarchies: hierarchies are returned wih each occurrence object. There is no option no to return them from the API. However, within the occ_search function you can select whether to return just hierarchies, just data, all of data and hiearchies and metadata, or just metadata. If all hierarchies are the same we just return one for you.

Data: By default only three data fields are returned: name (the species name), decimallatitude, and decimallongitude. Set parameter minimal=FALSE if you want more data.

Nerds: You can pass parameters not defined in this function into the call to the GBIF API to control things about the call itself using the callopts function. See an example below that passes in the verbose function to get details on the http call.

Scientific names vs. taxon keys: In the previous GBIF API and the version of rgbif that wrapped that API, you could search the equivalent of this function with a species name, which was convenient. However, names are messy right. So it sorta makes sense to sort out the species key numbers you want exactly, and then get your occurrence data with this function. GBIF has added a parameter scientificName to allow searches by scientific names in this function - which includes synonym taxa.

WKT: Examples of valid WKT objects:

  • 'POLYGON((30.1 10.1, 10 20, 20 60, 60 60, 30.1 10.1))'
  • 'POINT(30.1 10.1)'
  • 'LINESTRING(3 4,10 50,20 25)'
  • 'LINEARRING' ???' - Not sure how to specify this. Anyone?

Range queries: A range query is as it sounds - you query on a range of values defined by a lower and upper limit. Do a range query by specifying the lower and upper limit in a vector like depth='50,100'. It would be more R like to specify the range in a vector like c(50,100), but that sort of syntax allows you to do many searches, one for each element in the vector - thus range queries have to differ. The following parameters support range queries.

  • decimalLatitude
  • decimalLongitude
  • depth
  • elevation
  • eventDate
  • lastInterpreted
  • month
  • year

Usage

occ_search(taxonKey = NULL, scientificName = NULL, country = NULL,
  publishingCountry = NULL, hasCoordinate = NULL, typeStatus = NULL,
  recordNumber = NULL, lastInterpreted = NULL, continent = NULL,
  geometry = NULL, collectorName = NULL, basisOfRecord = NULL,
  datasetKey = NULL, eventDate = NULL, catalogNumber = NULL,
  year = NULL, month = NULL, decimalLatitude = NULL,
  decimalLongitude = NULL, elevation = NULL, depth = NULL,
  institutionCode = NULL, collectionCode = NULL, spatialIssues = NULL,
  search = NULL, callopts = list(), limit = 20, start = NULL,
  fields = "minimal", return = "all")

Arguments

taxonKey
A taxon key from the GBIF backbone. All included and synonym taxa are included in the search, so a search for aves with taxononKey=212 (i.e. /occurrence/search?taxonKey=212) will match all birds, no matter which species. You can pass many keys by passing
scientificName
A scientific name from the GBIF backbone. All included and synonym taxa are included in the search.
datasetKey
The occurrence dataset key (a uuid)
catalogNumber
An identifier of any form assigned by the source within a physical collection or digital dataset for the record which may not unique, but should be fairly unique in combination with the institution and collection code.
collectorName
The person who recorded the occurrence.
collectionCode
An identifier of any form assigned by the source to identify the physical collection or digital dataset uniquely within the text of an institution.
institutionCode
An identifier of any form assigned by the source to identify the institution the record belongs to. Not guaranteed to be que.
country
The 2-letter country code (as per ISO-3166-1) of the country in which the occurrence was recorded. See here http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
basisOfRecord
Basis of record, as defined in our BasisOfRecord enum here http://bit.ly/19kBGhG. Acceptable values are:
  • FOSSIL_SPECIMEN An occurrence record describing a fossilized specimen.
  • HUMAN_OBSERVATION An occurrence record describing an
eventDate
Occurrence date in ISO 8601 format: yyyy, yyyy-MM, yyyy-MM-dd, or MM-dd.
year
The 4 digit year. A year of 98 will be interpreted as AD 98.
month
The month of the year, starting with 1 for January.
search
Query terms. The value for this parameter can be a simple word or a phrase.
decimalLatitude
Latitude in decimals between -90 and 90 based on WGS 84. Supports range queries.
decimalLongitude
Longitude in decimals between -180 and 180 based on WGS 84. Supports range queries.
publishingCountry
The 2-letter country code (as per ISO-3166-1) of the country in which the occurrence was recorded.
elevation
Elevation in meters above sea level.
depth
Depth in meters relative to elevation. For example 10 meters below a lake surface with given elevation.
geometry
Searches for occurrences inside a polygon described in Well Known Text (WKT) format. A WKT shape written as either POINT, LINESTRING, LINEARRING or POLYGON. Example of a polygon: ((30.1 10.1, 20, 20 40, 40 40, 30.1 10.1)) would be queried as http://
spatialIssues
(logical) Includes/excludes occurrence records which contain spatial issues (as determined in our record interpretation), i.e. spatialIssues=TRUE returns only those records with spatial issues while spatialIssues=FALSE includes only records without spatia
hasCoordinate
(logical) Return only occurence records with lat/long data (TRUE) or all records (FALSE, default).
typeStatus
Type status of the specimen. One of many options. See ?typestatus
recordNumber
Number recorded by collector of the data, different from GBIF record number. See http://rs.tdwg.org/dwc/terms/#recordNumber for more info
lastInterpreted
Date the record was last modified in GBIF, in ISO 8601 format: yyyy, yyyy-MM, yyyy-MM-dd, or MM-dd. Supports range queries.
continent
Continent. One of africa, antarctica, asia, europe, north_america (North America includes the Caribbean and reachies down and includes Panama), oceania, or south_america
fields
(character) Default ('minimal') will return just taxon name, key, latitude, and longitute. 'all' returns all fields. Or specify each field you want returned by name, e.g. fields = c('name','latitude','elevation').
return
One of data, hier, meta, or all. If data, a data.frame with the data. hier returns the classifications in a list for each record. meta returns the metadata for the entire call. all gives all data back in a list.
callopts
Pass on options to httr::GET for more refined control of http calls, and error handling
limit
Number of records to return
start
Record number to start at

Value

  • A data.frame or list

References

http://www.gbif.org/developer/summary

Examples

Run this code
# Search by species name, using \code{\link{name_backbone}} first to get key
key <- name_suggest(q='Helianthus annuus', rank='species')$key[1]
occ_search(taxonKey=key, limit=2)

# Return 20 results, this is the default by the way
occ_search(taxonKey=key, limit=20)

# Return just metadata for the search
occ_search(taxonKey=key, return='meta')

# Instead of getting a taxon key first, you can search for a name directly
## However, note that using this approach (with \code{scientificName="..."})
## you are getting synonyms too. The results for using \code{scientifcName} and
## \code{taxonKey} parameters are the same in this case, but I wouldn't be surprised if for some
## names they return different results
occ_search(scientificName = 'Ursus americanus')
key <- name_backbone(name = 'Ursus americanus', rank='species')$usageKey
occ_search(taxonKey = key)

# Search by dataset key
occ_search(datasetKey='7b5d6a48-f762-11e1-a439-00145eb45e9a', return='data')

# Search by catalog number
occ_search(catalogNumber="49366")
occ_search(catalogNumber=c("49366","Bird.27847588"))

# Get all data, not just lat/long and name
occ_search(taxonKey=key, fields='all')

# Or get specific fields. Note that this isn't done on GBIF's side of things. This
# is done in R, but before you get the return object, so other fields are garbage
# collected
occ_search(taxonKey=key, fields=c('name','basisOfRecord','protocol'))

# Use paging parameters (limit and start) to page. Note the different results
# for the two queries below.
occ_search(datasetKey='7b5d6a48-f762-11e1-a439-00145eb45e9a',start=10,limit=5,
   return="data")
occ_search(datasetKey='7b5d6a48-f762-11e1-a439-00145eb45e9a',start=20,limit=5,
   return="data")

# Many dataset keys
occ_search(datasetKey=c("50c9509d-22c7-4a22-a47d-8c48425ef4a7",
   "7b5d6a48-f762-11e1-a439-00145eb45e9a"))

# Occurrence data: lat/long data, and associated metadata with occurrences
## If return='data' the output is a data.frame of all data together
## for easy manipulation
occ_search(taxonKey=key, return='data')

# Taxonomic hierarchy data
## If return='meta' the output is a list of the hierarch for each record
occ_search(taxonKey=key, return='hier')

# Search by collector name
occ_search(collectorName="smith")

# Many collector names
occ_search(collectorName=c("smith","BJ Stacey"))

# Pass in curl options for extra fun
library(httr)
occ_search(taxonKey=key, limit=20, return='hier', callopts=verbose())

# Search for many species
splist <- c('Cyanocitta stelleri', 'Junco hyemalis', 'Aix sponsa')
keys <- sapply(splist, function(x) name_suggest(x)$key[1], USE.NAMES=FALSE)
occ_search(taxonKey=keys, limit=5, return='data')

# Search on latitidue and longitude
occ_search(search="kingfisher", decimalLatitude=50, decimalLongitude=-10)

# Search on a bounding box (in well known text format)
occ_search(geometry='POLYGON((30.1 10.1, 10 20, 20 40, 40 40, 30.1 10.1))')
key <- name_suggest(q='Aesculus hippocastanum')$key[1]
occ_search(taxonKey=key, geometry='POLYGON((30.1 10.1, 10 20, 20 40, 40 40, 30.1 10.1))')

# Search on country
occ_search(country='US')

# Get only occurrences with lat/long data
occ_search(taxonKey=key, hasCoordinate=TRUE)

# Get only occurrences that were recorded as living specimens
occ_search(taxonKey=key, basisOfRecord="LIVING_SPECIMEN", hasCoordinate=TRUE)

# Get occurrences for a particular eventDate
occ_search(taxonKey=key, eventDate="2013")
occ_search(taxonKey=key, year="2013")
occ_search(taxonKey=key, month="6")

# Get occurrences based on depth
key <- name_backbone(name='Salmo salar', kingdom='animals')$speciesKey
occ_search(taxonKey=key, depth="5")

# Get occurrences based on elevation
key <- name_backbone(name='Puma concolor', kingdom='animals')$speciesKey
occ_search(taxonKey=key, elevation=50, hasCoordinate=TRUE)

# Get occurrences based on institutionCode
occ_search(institutionCode="TLMF")
occ_search(institutionCode=c("TLMF","ArtDatabanken"))

# Get occurrences based on collectionCode
occ_search(collectionCode="Floristic Databases MV - Higher Plants")
occ_search(collectionCode=c("Floristic Databases MV - Higher Plants","Artport"))

# Get only those occurrences with spatial issues
occ_search(taxonKey=key, spatialIssues=TRUE)

# Search using a query string
occ_search(search="kingfisher")

# Range queries
## See Detail for parameters that support range queries
occ_search(depth='50,100') # this is a range depth, with lower/upper limits in character string
occ_search(depth=c(50,100)) # this is not a range search, but does two searches for each depth

## Range search with year
occ_search(year='1999,2000')

## Range search with latitude
occ_search(decimalLatitude='29.59,29.6')

# Search by specimen type status
## Look for possible values of the \code{typeStatus} parameter looking at the typestatus dataset
occ_search(typeStatus = 'allotype', fields = c('name','typeStatus'))

# Search by specimen record number
## This is the record number of the person/group that submitted the data, not GBIF's numbers
## You can see that many different groups have record number 1, so not super helpful
occ_search(recordNumber = 1, fields = c('name','recordNumber','recordedBy'))

# Search by last time interpreted: Date the record was last modified in GBIF
## The \code{lastInterpreted} parameter accepts ISO 8601 format dates, including
## yyyy, yyyy-MM, yyyy-MM-dd, or MM-dd. Range queries are accepted for \code{lastInterpreted}
occ_search(lastInterpreted = '2014-04-02', fields = c('name','lastInterpreted'))

# Search by continent
## One of africa, antarctica, asia, europe, north_america, oceania, or south_america
occ_search(continent = 'south_america', return = 'meta')
occ_search(continent = 'africa', return = 'meta')
occ_search(continent = 'oceania', return = 'meta')
occ_search(continent = 'antarctica', return = 'meta')
# If you try multiple values for two different parameters you are wacked on the hand
occ_search(taxonKey=c(2482598,2492010), collectorName=c("smith","BJ Stacey"))

# Get a lot of data, here 1500 records for Helianthus annuus
out <- occ_search(taxonKey=key, limit=1500, return="data")
nrow(out)

Run the code above in your browser using DataLab