rgbif (version 0.9.2)

occ_search: Search for GBIF occurrences

Description

Search for GBIF occurrences

Usage

occ_search(taxonKey = NULL, scientificName = NULL, country = NULL,
  publishingCountry = NULL, hasCoordinate = NULL, typeStatus = NULL,
  recordNumber = NULL, lastInterpreted = NULL, continent = NULL,
  geometry = NULL, recordedBy = NULL, basisOfRecord = NULL,
  datasetKey = NULL, eventDate = NULL, catalogNumber = NULL,
  year = NULL, month = NULL, decimalLatitude = NULL,
  decimalLongitude = NULL, elevation = NULL, depth = NULL,
  institutionCode = NULL, collectionCode = NULL,
  hasGeospatialIssue = NULL, issue = NULL, search = NULL,
  mediaType = NULL, limit = 500, start = 0, fields = "all",
  return = "all", ...)

## S3 method for class 'gbif': print(x, ..., n = 10)

Arguments

taxonKey
A taxon key from the GBIF backbone. All included and synonym taxa are included in the search, so a search for aves with taxononKey=212 (i.e. /occurrence/search?taxonKey=212) will match all birds, no matter which species. You can pass many keys by passing
scientificName
A scientific name from the GBIF backbone. All included and synonym taxa are included in the search.
country
The 2-letter country code (as per ISO-3166-1) of the country in which the occurrence was recorded. See here http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
publishingCountry
The 2-letter country code (as per ISO-3166-1) of the country in which the occurrence was recorded.
hasCoordinate
(logical) Return only occurence records with lat/long data (TRUE) or all records (FALSE, default).
typeStatus
Type status of the specimen. One of many options. See ?typestatus
recordNumber
Number recorded by collector of the data, different from GBIF record number. See http://rs.tdwg.org/dwc/terms/#recordNumber for more info
lastInterpreted
Date the record was last modified in GBIF, in ISO 8601 format: yyyy, yyyy-MM, yyyy-MM-dd, or MM-dd. Supports range queries, smaller,larger (e.g., '1990,1991', whereas '1991,1990' wouldn't work)
continent
Continent. One of africa, antarctica, asia, europe, north_america (North America includes the Caribbean and reachies down and includes Panama), oceania, or south_america
geometry
Searches for occurrences inside a polygon described in Well Known Text (WKT) format. A WKT shape written as either POINT, LINESTRING, LINEARRING or POLYGON. Example of a polygon: ((30.1 10.1, 20, 20 40, 40 40, 30.1 10.1)) would be queried as http://
recordedBy
The person who recorded the occurrence.
basisOfRecord
Basis of record, as defined in our BasisOfRecord enum here http://gbif.github.io/gbif-api/apidocs/org/gbif/api/vocabulary/BasisOfRecord.html Acceptable values are:
  • FOSSIL_SPECIMEN An occurrence record describing a fossilized specimen.
datasetKey
The occurrence dataset key (a uuid)
eventDate
Occurrence date in ISO 8601 format: yyyy, yyyy-MM, yyyy-MM-dd, or MM-dd. Supports range queries, smaller,larger (e.g., '1990,1991', whereas '1991,1990' wouldn't work)
catalogNumber
An identifier of any form assigned by the source within a physical collection or digital dataset for the record which may not unique, but should be fairly unique in combination with the institution and collection code.
year
The 4 digit year. A year of 98 will be interpreted as AD 98. Supports range queries, smaller,larger (e.g., '1990,1991', whereas '1991,1990' wouldn't work)
month
The month of the year, starting with 1 for January. Supports range queries, smaller,larger (e.g., '1,2', whereas '2,1' wouldn't work)
decimalLatitude
Latitude in decimals between -90 and 90 based on WGS 84. Supports range queries, smaller,larger (e.g., '25,30', whereas '30,25' wouldn't work)
decimalLongitude
Longitude in decimals between -180 and 180 based on WGS 84. Supports range queries (e.g., '-0.4,-0.2', whereas '-0.2,-0.4' wouldn't work).
elevation
Elevation in meters above sea level. Supports range queries, smaller,larger (e.g., '5,30', whereas '30,5' wouldn't work)
depth
Depth in meters relative to elevation. For example 10 meters below a lake surface with given elevation. Supports range queries, smaller,larger (e.g., '5,30', whereas '30,5' wouldn't work)
institutionCode
An identifier of any form assigned by the source to identify the institution the record belongs to. Not guaranteed to be que.
collectionCode
An identifier of any form assigned by the source to identify the physical collection or digital dataset uniquely within the text of an institution.
hasGeospatialIssue
(logical) Includes/excludes occurrence records which contain spatial issues (as determined in our record interpretation), i.e. hasGeospatialIssue=TRUE returns only those records with spatial issues while hasGeospatialIssue=FALSE
issue
(character) One or more of many possible issues with each occurrence record. See Details. Issues passed to this parameter filter results by the issue.
search
Query terms. The value for this parameter can be a simple word or a phrase.
mediaType
Media type. Default is NULL, so no filtering on mediatype. Options: NULL, 'MovingImage', 'Sound', and 'StillImage'.
limit
Number of records to return. Default: 500. Note that the per request maximum is 300, but since we set it at 500 for the function, we do two requests to get you the 500 records (if there are that many). Note that there is a hard maximum of 200,000, which i
start
Record number to start at. Use in combination with limit to page through results. Note that we do the paging internally for you, but you can manually set the start parameter
fields
(character) Default ('all') returns all fields. 'minimal' returns just taxon name, key, latitude, and longitute. Or specify each field you want returned by name, e.g. fields = c('name','latitude','elevation').
return
One of data, hier, meta, or all. If data, a data.frame with the data. hier returns the classifications in a list for each record. meta returns the metadata for the entire call. all gives all data back in a list.
...
Further named parameters, such as query, path, etc, passed on to modify_url within GET call. Unnamed parameters will be comb
x
Output from a call to occ_search
n
Number of rows of the data to print.

Value

  • A data.frame or list

Details

Note that you can pass in a vector to one of taxonkey, datasetKey, and catalogNumber parameters in a function call, but not a vector >1 of the three parameters at the same time

Hierarchies: hierarchies are returned wih each occurrence object. There is no option no to return them from the API. However, within the occ_search function you can select whether to return just hierarchies, just data, all of data and hiearchies and metadata, or just metadata. If all hierarchies are the same we just return one for you.

Data: By default only three data fields are returned: name (the species name), decimallatitude, and decimallongitude. Set parameter minimal=FALSE if you want more data.

Nerds: You can pass parameters not defined in this function into the call to the GBIF API to control things about the call itself using .... See an example below that passes in the verbose function to get details on the http call.

Scientific names vs. taxon keys: In the previous GBIF API and the version of rgbif that wrapped that API, you could search the equivalent of this function with a species name, which was convenient. However, names are messy right. So it sorta makes sense to sort out the species key numbers you want exactly, and then get your occurrence data with this function. GBIF has added a parameter scientificName to allow searches by scientific names in this function - which includes synonym taxa. Note: that if you do use the scientificName parameter, we will check internally that it's not a synonym of an accepted name, and if it is, we'll search on the accepted name. If you want to force searching by a synonym do so by finding the GBIF identifier first with any name_* functions, then pass that ID to the taxonKey parameter.

WKT: Examples of valid WKT objects:

  • 'POLYGON((30.1 10.1, 10 20, 20 60, 60 60, 30.1 10.1))'
  • 'POINT(30.1 10.1)'
  • 'LINESTRING(3 4,10 50,20 25)'
  • 'LINEARRING' ???' - Not sure how to specify this. Anyone?

Range queries: A range query is as it sounds - you query on a range of values defined by a lower and upper limit. Do a range query by specifying the lower and upper limit in a vector like depth='50,100'. It would be more R like to specify the range in a vector like c(50,100), but that sort of syntax allows you to do many searches, one for each element in the vector - thus range queries have to differ. The following parameters support range queries.

  • decimalLatitude
  • decimalLongitude
  • depth
  • elevation
  • eventDate
  • lastInterpreted
  • month
  • year

Issue: The options for the issue parameter (from http://gbif.github.io/gbif-api/apidocs/org/gbif/api/vocabulary/OccurrenceIssue.html):

  • BASIS_OF_RECORD_INVALID The given basis of record is impossible to interpret or seriously different from the recommended vocabulary.
  • CONTINENT_COUNTRY_MISMATCH The interpreted continent and country do not match up.
  • CONTINENT_DERIVED_FROM_COORDINATES The interpreted continent is based on the coordinates, not the verbatim string information.
  • CONTINENT_INVALID Uninterpretable continent values found.
  • COORDINATE_INVALID Coordinate value given in some form but GBIF is unable to interpret it.
  • COORDINATE_OUT_OF_RANGE Coordinate has invalid lat/lon values out of their decimal max range.
  • COORDINATE_REPROJECTED The original coordinate was successfully reprojected from a different geodetic datum to WGS84.
  • COORDINATE_REPROJECTION_FAILED The given decimal latitude and longitude could not be reprojected to WGS84 based on the provided datum.
  • COORDINATE_REPROJECTION_SUSPICIOUS Indicates successful coordinate reprojection according to provided datum, but which results in a datum shift larger than 0.1 decimal degrees.
  • COORDINATE_ROUNDED Original coordinate modified by rounding to 5 decimals.
  • COUNTRY_COORDINATE_MISMATCH The interpreted occurrence coordinates fall outside of the indicated country.
  • COUNTRY_DERIVED_FROM_COORDINATES The interpreted country is based on the coordinates, not the verbatim string information.
  • COUNTRY_INVALID Uninterpretable country values found.
  • COUNTRY_MISMATCH Interpreted country for dwc:country and dwc:countryCode contradict each other.
  • DEPTH_MIN_MAX_SWAPPED Set if supplied min>max
  • DEPTH_NON_NUMERIC Set if depth is a non numeric value
  • DEPTH_NOT_METRIC Set if supplied depth is not given in the metric system, for example using feet instead of meters
  • DEPTH_UNLIKELY Set if depth is larger than 11.000m or negative.
  • ELEVATION_MIN_MAX_SWAPPED Set if supplied min > max elevation
  • ELEVATION_NON_NUMERIC Set if elevation is a non numeric value
  • ELEVATION_NOT_METRIC Set if supplied elevation is not given in the metric system, for example using feet instead of meters
  • ELEVATION_UNLIKELY Set if elevation is above the troposphere (17km) or below 11km (Mariana Trench).
  • GEODETIC_DATUM_ASSUMED_WGS84 Indicating that the interpreted coordinates assume they are based on WGS84 datum as the datum was either not indicated or interpretable.
  • GEODETIC_DATUM_INVALID The geodetic datum given could not be interpreted.
  • IDENTIFIED_DATE_INVALID The date given for dwc:dateIdentified is invalid and cant be interpreted at all.
  • IDENTIFIED_DATE_UNLIKELY The date given for dwc:dateIdentified is in the future or before Linnean times (1700).
  • MODIFIED_DATE_INVALID A (partial) invalid date is given for dc:modified, such as a non existing date, invalid zero month, etc.
  • MODIFIED_DATE_UNLIKELY The date given for dc:modified is in the future or predates unix time (1970).
  • MULTIMEDIA_DATE_INVALID An invalid date is given for dc:created of a multimedia object.
  • MULTIMEDIA_URI_INVALID An invalid uri is given for a multimedia object.
  • PRESUMED_NEGATED_LATITUDE Latitude appears to be negated, e.g. 32.3 instead of -32.3
  • PRESUMED_NEGATED_LONGITUDE Longitude appears to be negated, e.g. 32.3 instead of -32.3
  • PRESUMED_SWAPPED_COORDINATE Latitude and longitude appear to be swapped.
  • RECORDED_DATE_INVALID A (partial) invalid date is given, such as a non existing date, invalid zero month, etc.
  • RECORDED_DATE_MISMATCH The recording date specified as the eventDate string and the individual year, month, day are contradicting.
  • RECORDED_DATE_UNLIKELY The recording date is highly unlikely, falling either into the future or represents a very old date before 1600 that predates modern taxonomy.
  • REFERENCES_URI_INVALID An invalid uri is given for dc:references.
  • TAXON_MATCH_FUZZY Matching to the taxonomic backbone can only be done using a fuzzy, non exact match.
  • TAXON_MATCH_HIGHERRANK Matching to the taxonomic backbone can only be done on a higher rank and not the scientific name.
  • TAXON_MATCH_NONE Matching to the taxonomic backbone cannot be done cause there was no match at all or several matches with too little information to keep them apart (homonyms).
  • TYPE_STATUS_INVALID The given type status is impossible to interpret or seriously different from the recommended vocabulary.
  • ZERO_COORDINATE Coordinate is the exact 0/0 coordinate, often indicating a bad null coordinate.

Counts: There is a slight difference in the way records are counted here vs. results from occ_count. For equivalent outcomes, in this function use hasCoordinate=TRUE, and hasGeospatialIssue=FALSE to have the same outcome using occ_count with isGeoreferenced=TRUE.

References

http://www.gbif.org/developer/occurrence#search

See Also

downloads, occ_data

Examples

Run this code
# Search by species name, using \code{\link{name_backbone}} first to get key
(key <- name_suggest(q='Helianthus annuus', rank='species')$key[1])
occ_search(taxonKey=key, limit=2)

# Return 20 results, this is the default by the way
occ_search(taxonKey=key, limit=20)

# Return just metadata for the search
occ_search(taxonKey=key, limit=100, return='meta')

# Instead of getting a taxon key first, you can search for a name directly
## However, note that using this approach (with \code{scientificName="..."})
## you are getting synonyms too. The results for using \code{scientifcName} and
## \code{taxonKey} parameters are the same in this case, but I wouldn't be surprised if for some
## names they return different results
occ_search(scientificName = 'Ursus americanus', config=verbose())
key <- name_backbone(name = 'Ursus americanus', rank='species')$usageKey
occ_search(taxonKey = key)

# Search by dataset key
occ_search(datasetKey='7b5d6a48-f762-11e1-a439-00145eb45e9a', return='data', limit=20)

# Search by catalog number
occ_search(catalogNumber="49366", limit=20)
occ_search(catalogNumber=c("49366","Bird.27847588"), limit=20)

# Get all data, not just lat/long and name
occ_search(taxonKey=key, fields='all', limit=20)

# Or get specific fields. Note that this isn't done on GBIF's side of things. This
# is done in R, but before you get the return object, so other fields are garbage
# collected
occ_search(taxonKey=key, fields=c('name','basisOfRecord','protocol'), limit=20)

# Use paging parameters (limit and start) to page. Note the different results
# for the two queries below.
occ_search(datasetKey='7b5d6a48-f762-11e1-a439-00145eb45e9a',start=10,limit=5,
   return="data")
occ_search(datasetKey='7b5d6a48-f762-11e1-a439-00145eb45e9a',start=20,limit=5,
   return="data")

# Many dataset keys
occ_search(datasetKey=c("50c9509d-22c7-4a22-a47d-8c48425ef4a7",
   "7b5d6a48-f762-11e1-a439-00145eb45e9a"), limit=20)

# Occurrence data: lat/long data, and associated metadata with occurrences
## If return='data' the output is a data.frame of all data together
## for easy manipulation
occ_search(taxonKey=key, return='data', limit=20)

# Taxonomic hierarchy data
## If return='meta' the output is a list of the hierarch for each record
occ_search(taxonKey=key, return='hier', limit=10)

# Search by recorder
occ_search(recordedBy="smith", limit=20)

# Many collector names
occ_search(recordedBy=c("smith","BJ Stacey"), limit=20)

# Pass in curl options for extra fun
library('httr')
occ_search(taxonKey=key, limit=20, return='hier', config=verbose())
# occ_search(taxonKey=key, limit=20, return='hier', config=progress())
# occ_search(taxonKey=key, limit=20, return='hier', config=timeout(1))

# Search for many species
splist <- c('Cyanocitta stelleri', 'Junco hyemalis', 'Aix sponsa')
keys <- sapply(splist, function(x) name_suggest(x)$key[1], USE.NAMES=FALSE)
occ_search(taxonKey=keys, limit=5, return='data')

# Search using a synonym name
#  Note that you'll see a message printing out that the accepted name will be used
occ_search(scientificName = 'Pulsatilla patens', fields = c('name','scientificName'), limit=5)

# Search on latitidue and longitude
occ_search(search="kingfisher", decimalLatitude=50, decimalLongitude=-10)

# Search on a bounding box
## in well known text format
occ_search(geometry='POLYGON((30.1 10.1, 10 20, 20 40, 40 40, 30.1 10.1))', limit=20)
key <- name_suggest(q='Aesculus hippocastanum')$key[1]
occ_search(taxonKey=key, geometry='POLYGON((30.1 10.1, 10 20, 20 40, 40 40, 30.1 10.1))',
   limit=20)
## or using bounding box, converted to WKT internally
occ_search(geometry=c(-125.0,38.4,-121.8,40.9), limit=20)

# Search on country
occ_search(country='US', fields=c('name','country'), limit=20)
isocodes[grep("France", isocodes$name),"code"]
occ_search(country='FR', fields=c('name','country'), limit=20)
occ_search(country='DE', fields=c('name','country'), limit=20)
occ_search(country=c('US','DE'), fields=c('name','country'), limit=20)

# Get only occurrences with lat/long data
occ_search(taxonKey=key, hasCoordinate=TRUE, limit=20)

# Get only occurrences that were recorded as living specimens
occ_search(taxonKey=key, basisOfRecord="LIVING_SPECIMEN", hasCoordinate=TRUE, limit=20)

# Get occurrences for a particular eventDate
occ_search(taxonKey=key, eventDate="2013", limit=20)
occ_search(taxonKey=key, year="2013", limit=20)
occ_search(taxonKey=key, month="6", limit=20)

# Get occurrences based on depth
key <- name_backbone(name='Salmo salar', kingdom='animals')$speciesKey
occ_search(taxonKey=key, depth="5", limit=20)

# Get occurrences based on elevation
key <- name_backbone(name='Puma concolor', kingdom='animals')$speciesKey
occ_search(taxonKey=key, elevation=50, hasCoordinate=TRUE, limit=20)

# Get occurrences based on institutionCode
occ_search(institutionCode="TLMF", limit=20)
occ_search(institutionCode=c("TLMF","ArtDatabanken"), limit=20)

# Get occurrences based on collectionCode
occ_search(collectionCode="Floristic Databases MV - Higher Plants", limit=20)
occ_search(collectionCode=c("Floristic Databases MV - Higher Plants","Artport"))

# Get only those occurrences with spatial issues
occ_search(taxonKey=key, hasGeospatialIssue=TRUE, limit=20)

# Search using a query string
occ_search(search="kingfisher", limit=20)

# Range queries
## See Detail for parameters that support range queries
occ_search(depth='50,100') # this is a range depth, with lower/upper limits in character string
occ_search(depth=c(50,100)) # this is not a range search, but does two searches for each depth

## Range search with year
occ_search(year='1999,2000', limit=20)

## Range search with latitude
occ_search(decimalLatitude='29.59,29.6')

# Search by specimen type status
## Look for possible values of the typeStatus parameter looking at the typestatus dataset
occ_search(typeStatus = 'allotype', fields = c('name','typeStatus'))

# Search by specimen record number
## This is the record number of the person/group that submitted the data, not GBIF's numbers
## You can see that many different groups have record number 1, so not super helpful
occ_search(recordNumber = 1, fields = c('name','recordNumber','recordedBy'))

# Search by last time interpreted: Date the record was last modified in GBIF
## The lastInterpreted parameter accepts ISO 8601 format dates, including
## yyyy, yyyy-MM, yyyy-MM-dd, or MM-dd. Range queries are accepted for lastInterpreted
occ_search(lastInterpreted = '2014-04-02', fields = c('name','lastInterpreted'))

# Search by continent
## One of africa, antarctica, asia, europe, north_america, oceania, or south_america
occ_search(continent = 'south_america', return = 'meta')
occ_search(continent = 'africa', return = 'meta')
occ_search(continent = 'oceania', return = 'meta')
occ_search(continent = 'antarctica', return = 'meta')

# Search for occurrences with images
occ_search(mediaType = 'StillImage', return='media')
occ_search(mediaType = 'MovingImage', return='media')
occ_search(mediaType = 'Sound', return='media')

# Query based on issues - see Details for options
## one issue
occ_search(taxonKey=1, issue='DEPTH_UNLIKELY', fields =
   c('name','key','decimalLatitude','decimalLongitude','depth'))
## two issues
occ_search(taxonKey=1, issue=c('DEPTH_UNLIKELY','COORDINATE_ROUNDED'))
# Show all records in the Arizona State Lichen Collection that cant be matched to the GBIF
# backbone properly:
occ_search(datasetKey='84c0e1a0-f762-11e1-a439-00145eb45e9a',
   issue=c('TAXON_MATCH_NONE','TAXON_MATCH_HIGHERRANK'))

# Parsing output by issue
(res <- occ_search(geometry='POLYGON((30.1 10.1, 10 20, 20 40, 40 40, 30.1 10.1))', limit = 50))
## what do issues mean, can print whole table, or search for matches
head(gbif_issues())
gbif_issues()[ gbif_issues()$code %in% c('cdround','cudc','gass84','txmathi'), ]
## or parse issues in various ways
### remove data rows with certain issue classes
library('magrittr')
res %>% occ_issues(gass84)
### split issues into separate columns
res %>% occ_issues(mutate = "split")
### expand issues to more descriptive names
res %>% occ_issues(mutate = "expand")
### split and expand
res %>% occ_issues(mutate = "split_expand")
### split, expand, and remove an issue class
res %>% occ_issues(-cudc, mutate = "split_expand")

# If you try multiple values for two different parameters you are wacked on the hand
# occ_search(taxonKey=c(2482598,2492010), recordedBy=c("smith","BJ Stacey"))

# Get a lot of data, here 1500 records for Helianthus annuus
# out <- occ_search(taxonKey=key, limit=1500, return="data")
# nrow(out)

# If you pass in an invalid polygon you get hopefully informative errors

### the WKT string is fine, but GBIF says bad polygon
wkt <- 'POLYGON((-178.59375 64.83258989321493,-165.9375 59.24622380205539,
-147.3046875 59.065977905449806,-130.78125 51.04484764446178,-125.859375 36.70806354647625,
-112.1484375 23.367471303759686,-105.1171875 16.093320185359257,-86.8359375 9.23767076398516,
-82.96875 2.9485268155066175,-82.6171875 -14.812060061226388,-74.8828125 -18.849111862023985,
-77.34375 -47.661687803329166,-84.375 -49.975955187343295,174.7265625 -50.649460483096114,
179.296875 -42.19189902447192,-176.8359375 -35.634976650677295,176.8359375 -31.835565983656227,
163.4765625 -6.528187613695323,152.578125 1.894796132058301,135.703125 4.702353722559447,
127.96875 15.077427674847987,127.96875 23.689804541429606,139.921875 32.06861069132688,
149.4140625 42.65416193033991,159.2578125 48.3160811030533,168.3984375 57.019804336633165,
178.2421875 59.95776046458139,-179.6484375 61.16708631440347,-178.59375 64.83258989321493))'

# occ_search(geometry = gsub("\n", '', wkt))

### unable to parse due to last number pair needing two numbers, not one
# wkt <- 'POLYGON((-178.5 64.8,-165.9 59.2,-147.3 59.0,-130.7 51.0,-125.8))'
# occ_search(geometry = wkt)

### unable to parse due to unclosed string
# wkt <- 'POLYGON((-178.5 64.8,-165.9 59.2,-147.3 59.0,-130.7 51.0))'
# occ_search(geometry = wkt)
### another of the same
# wkt <- 'POLYGON((-178.5 64.8,-165.9 59.2,-147.3 59.0,-130.7 51.0,-125.8 36.7))'
# occ_search(geometry = wkt)

### returns no results
# wkt <- 'LINESTRING(3 4,10 50,20 25)'
# occ_search(geometry = wkt)

### Apparently a point is allowed, but haven't successfully retrieved data, so returns nothing
# wkt <- 'POINT(45 -122)'
# occ_search(geometry = wkt)

Run the code above in your browser using DataLab