Learn R Programming

spocc (version 0.5.0)

occ: Search for species occurrence data across many data sources.

Description

Search on a single species name, or many. And search across a single or many data sources.

Usage

occ(query = NULL, from = "gbif", limit = 500, start = NULL,
  page = NULL, geometry = NULL, has_coords = NULL, ids = NULL,
  callopts = list(), gbifopts = list(), bisonopts = list(),
  inatopts = list(), ebirdopts = list(), ecoengineopts = list(),
  antwebopts = list(), vertnetopts = list(), idigbioopts = list())

Arguments

query
(character) One to many scientific names. See Details for what parameter in each data source we query.
from
(character) Data source to get data from, any combination of gbif, bison, inat, ebird, ecoengine and/or vertnet
limit
(numeric) Number of records to return. This is passed across all sources. To specify different limits for each source, use the options for each source (gbifopts, bisonopts, inatopts, ebirdopts, ecoengineopts, and antwebopts). See Details for more. Default
start, page
(integer) Record to start at or page to start at. See Paging in Details for how these parameters are used internally. Optional
geometry
(character or nmeric) One of a Well Known Text (WKT) object or a vector of length 4 specifying a bounding box. This parameter searches for occurrences inside a box given as a bounding box or polygon described in WKT format. A WKT shape written as 'POLYGON
has_coords
(logical) Only return occurrences that have lat/long data. This works for gbif, ecoengine, antweb, rinat, idigbio, and vertnet, but is ignored for ebird and bison data sources. You can easily though remove records without lat/long data.
ids
Taxonomic identifiers. This can be a list of length 1 to many. See examples for usage. Currently, identifiers for only 'gbif' and 'bison' for parameter 'from' supported. If this parameter is used, query parameter can not be used - if it is, a warning is t
callopts
Options passed on to GET, e.g., for debugging curl calls, setting timeouts, etc. This parameter is ignored for sources: antweb, inat.
gbifopts
(list) List of named options to pass on to occ_search. See also occ_options.
bisonopts
(list) List of named options to pass on to bison. See also occ_options.
inatopts
(list) List of named options to pass on to internal function get_inat_obs
ebirdopts
(list) List of named options to pass on to ebirdregion or ebirdgeo. See also occ_options
ecoengineopts
(list) List of named options to pass on to ee_observations. See also occ_options.
antwebopts
(list) List of named options to pass on to aw_data. See also occ_options.
vertnetopts
(list) List of named options to pass on to searchbyterm. See also occ_options..
idigbioopts
(list) List of named options to pass on to idig_search_records. See also occ_options.

Inputs

All inputs to occ are one of:
  • scientific name
  • taxonomic id
  • geometry as bounds, WKT, os Spatial classes
To search by common name, first use occ_names to find scientic names or taxonomic IDs, then feed those to this function. Or use the taxize package to get names and/or IDs to use here.

Using the query parameter

When you use the query parameter, we pass your search terms on to parameters within functions that query data sources you specify. Those parameters are:
  • rgbif -scientificNamein theocc_searchfunction - API parameter: same as theoccparameter
  • rebird -speciesin theebirdregionorebirdgeofunctions, depending on whether you setmethod="ebirdregion"ormethod="ebirdgeo"- API parameters:scifor bothebirdregionandebirdgeo
  • ecoengine -scientific_namein theee_observationsfunction - API parameter: same asoccparameter
  • rbison -speciesorscientificNamein thebisonorbison_solrfunctions, respectively. If you don't pass anything togeometryparameter we usebison_solr, and if you do we usebison- API parameters: same asoccparameters
  • AntWeb -scientific_nameorgenusin theaw_datafunction, depending on whether binomial or single name passed - API parameter:speciesforscientific_nameandgenusforgenus
  • rvertnet -taxonin thevertsearchfunction - API parameter:q
  • ridigbio -scientificnamein theidig_search_recordsfunction - API parameter:scientificname
  • inat - internal function - API parameter:q
If you have questions about how each of those parameters behaves with respect to the terms you pass to it, lookup documentation for those functions, or get in touch at the development repository https://github.com/ropensci/spocc/issues

iDigBio notes

When searching iDigBio note that by deafult we set fields = "all", so that we return a richer suite of fields than the ridigbio R client gives by default. But you can changes this by passing in a fields parameter to idigbioopts parameter with the specific fields you want.

Ecoengine notes

When searching ecoengine, you can leave the page argument blank to get a single page. Otherwise use page ranges or simply "all" to request all available pages. Note however that this may hang your call if the request is simply too large.

limit parameter

The limit parameter is set to a default of 25. This means that you will get up to 25 results back for each data source you ask for data from. If there are no results for a particular source, you'll get zero back; if there are 8 results for a particular source, you'll get 8 back. If there are 26 results for a particular source, you'll get 25 back. You can always ask for more or less back by setting the limit parameter to any number. If you want to request a different number for each source, pass the appropriate parameter to each data source via the respective options parameter for each data source.

WKT

WKT objects are strings of pairs of lat/long coordinates that define a shape. Many classes of shapes are supported, including POLYGON, POINT, and MULTIPOLYGON. Within each defined shape define all vertices of the shape with a coordinate like 30.1 10.1, the first of which is the latitude, the second the longitude.

Examples of valid WKT objects:

  • 'POLYGON((30.1 10.1, 10 20, 20 60, 60 60, 30.1 10.1))'
  • 'POINT((30.1 10.1))'
  • 'LINESTRING(3 4,10 50,20 25)'
  • 'MULTIPOINT((3.5 5.6),(4.8 10.5))")'
  • 'MULTILINESTRING((3 4,10 50,20 25),(-5 -8,-10 -8,-15 -4))'
  • 'MULTIPOLYGON(((1 1,5 1,5 5,1 5,1 1),(2 2,2 3,3 3,3 2,2 2)),((6 3,9 2,9 4,6 3)))'
  • 'GEOMETRYCOLLECTION(POINT(4 6),LINESTRING(4 6,7 10))'

Only POLYGON objects are currently supported.

Getting WKT polygons or bounding boxes. We will soon introduce a function to help you select a bounding box but for now, you can use a few sites on the web.

  • Bounding box -http://boundingbox.klokantech.com/
  • Well known text -http://arthur-e.github.io/Wicket/sandbox-gmaps3.html

geometry parameter

The behavior of the occ function with respect to the geometry parameter varies depending on the inputs to the query parameter. Here are the options:
  • geometry (single), no query - If a single bounding box/WKT string passed in, and no query, a single query is made against each data source.
  • geometry (many), no query - If many bounding boxes/WKT strings are passed in, we do a separate query for each bounding box/WKT string against each data source.
  • geometry (single), query - If a single bounding box/WKT string passed in, and a single query, we do a single query against each data source.
  • geometry (many), query - If many bounding boxes/WKT strings are passed in, and a single query, we do a separate query for each bounding box/WKT string with the same queried name against each data source.
  • geometry (single), many query - If a single bounding box/WKT string passed in, and many names to query, we do a separate query for each name, using the same geometry, for each data source.
  • geometry (many), many query - If many bounding boxes/WKT strings are passed in, and many names to query, this poses a problem for all data sources, none of which accept many bounding boxes of WKT strings. So, in this scenario, we loop over each name and each geometry query, and then re-combine by queried name, so that you get back a single group of data for each name.

Geometry options by data provider

wkt & bbox allowed, see WKT section above
  • gbif
  • bison

bbox only

  • ecoengine
  • inat
  • idigbio

No spatial search allowed

  • antweb
  • ebird
  • vertnet

Paging

  • gbif - Responds tostart. Default: 0
  • ecoengine - Responds topage. Default: 1
  • antweb - Responds tostart. Default: 0
  • bison - Responds tostart. Default: 0
  • inat - Responds topage. Default: 1
  • ebird - No paging, bothstartandpageignored.
  • vertnet - No paging implemented here, bothstartandpageignored. VertNet does have a form of paging, but it uses a cursor, and can't easily be included here via parameters. However,rvertnetdoes paging internally for you. For example, the max records per request for VertNet is 1000; if you request 2000 records, we'll do the first request, and do the second request for you automatically.
  • idigbio - Responds tostart. Default: 0

BEWARE

In cases where you request data from multiple providers, especially when including GBIF, there could be duplicate records since many providers' data eventually ends up with GBIF. See spocc_duplicates for more.

Details

The occ function is an opinionated wrapper around the rgbif, rbison, rinat, rebird, AntWeb, ecoengine, rvertnet and ridigbio packages to allow data access from a single access point. We take care of making sure you get useful objects out at the cost of flexibility/options - although you can still set options for each of the packages via the gbifopts, bisonopts, inatopts, ebirdopts, ecoengineopts, vertnetopts, antwebopts and idigbioopts parameters.

Examples

Run this code
# Single data sources
(res <- occ(query = 'Accipiter striatus', from = 'gbif', limit = 5))
res$gbif
(res <- occ(query = 'Accipiter', from = 'ecoengine', limit = 50))
res$ecoengine
(res <- occ(query = 'Accipiter striatus', from = 'ebird', limit = 50))
res$ebird
(res <- occ(query = 'Danaus plexippus', from = 'inat', limit = 50))
res$inat
(res <- occ(query = 'Bison bison', from = 'bison', limit = 50))
res$bison
(res <- occ(query = 'Bison bison', from = 'vertnet', limit = 5))
res$vertnet
res$vertnet$data$Bison_bison
occ2df(res)

# Paging
one <- occ(query = 'Accipiter striatus', from = 'gbif', limit = 5)
two <- occ(query = 'Accipiter striatus', from = 'gbif', limit = 5, start = 5)
one$gbif
two$gbif

# Restrict to records with coordinates
occ(query = "Acer", from = "idigbio", limit = 5, has_coords = TRUE)

# Data from AntWeb
# By species
(by_species <- occ(query = "linepithema humile", from = "antweb", limit = 10))
# or by genus
(by_genus <- occ(query = "acanthognathus", from = "antweb"))

occ(query = 'Setophaga caerulescens', from = 'ebird', ebirdopts = list(region='US'))
occ(query = 'Spinus tristis', from = 'ebird', ebirdopts =
   list(method = 'ebirdgeo', lat = 42, lng = -76, dist = 50))

# idigbio data
## scientific name search
occ(query = "Acer", from = "idigbio", limit = 5)
occ(query = "Acer", from = "idigbio", idigbioopts = list(offset = 5, limit  = 3))
## geo search
bounds <- c(-120, 40, -100, 45)
occ(from = "idigbio", geometry = bounds, limit = 10)
## just class arachnida, spiders
occ(idigbioopts = list(rq = list(class = 'arachnida')), from = "idigbio", limit = 10)
## search certain recordsets
sets <- c("1ffce054-8e3e-4209-9ff4-c26fa6c24c2f",
    "8dc14464-57b3-423e-8cb0-950ab8f36b6f", 
    "26f7cbde-fbcb-4500-80a9-a99daa0ead9d")
occ(idigbioopts = list(rq = list(recordset = sets)), from = "idigbio", limit = 10)

# You can pass on limit param to all sources even though its a different param in that source
## ecoengine example
res <- occ(query = 'Accipiter striatus', from = 'ecoengine', ecoengineopts=list(limit = 5))
res$ecoengine
## This is particularly useful when you want to set different limit for each source
(res <- occ(query = 'Accipiter striatus', from = c('gbif','ecoengine'),
   gbifopts=list(limit = 10), ecoengineopts=list(limit = 5)))

# Many data sources
(out <- occ(query = 'Pinus contorta', from=c('gbif','bison','vertnet'), limit=10))

## Select individual elements
out$gbif
out$gbif$data
out$vertnet

## Coerce to combined data.frame, selects minimal set of
## columns (name, lat, long, provider, date, occurrence key)
occ2df(out)

# Pass in limit parameter to all sources. This limits the number of occurrences
# returned to 10, in this example, for all sources, in this case gbif and inat.
occ(query='Pinus contorta', from=c('gbif','inat'), limit=10)

# Geometry
## Pass in geometry parameter to all sources. This constraints the search to the
## specified polygon for all sources, gbif and bison in this example.
## Check out http://arthur-e.github.io/Wicket/sandbox-gmaps3.html to get a WKT string
occ(query='Accipiter', from='gbif',
   geometry='POLYGON((30.1 10.1, 10 20, 20 60, 60 60, 30.1 10.1))')
occ(query='Helianthus annuus', from='bison', limit=50,
   geometry='POLYGON((-111.06 38.84, -110.80 39.37, -110.20 39.17, -110.20 38.90,
                      -110.63 38.67, -111.06 38.84))')

## Or pass in a bounding box, which is automatically converted to WKT (required by GBIF)
## via the bbox2wkt function. The format of a bounding box is
## [min-longitude, min-latitude, max-longitude, max-latitude].
occ(query='Accipiter striatus', from='gbif', geometry=c(-125.0,38.4,-121.8,40.9))

## Bounding box constraint with ecoengine
## Use this website: http://boundingbox.klokantech.com/ to quickly grab a bbox.
## Just set the format on the bottom left to CSV.
occ(query='Accipiter striatus', from='ecoengine', limit=10,
   geometry=c(-125.0,38.4,-121.8,40.9))

## lots of results, can see how many by indexing to meta
res <- occ(query='Accipiter striatus', from='gbif',
   geometry='POLYGON((-69.9 49.2,-69.9 29.0,-123.3 29.0,-123.3 49.2,-69.9 49.2))')
res$gbif

## You can pass in geometry to each source separately via their opts parameter, at
## least those that support it. Note that if you use rinat, you reverse the order, with
## latitude first, and longitude second, but here it's the reverse for consistency across
## the spocc package
bounds <- c(-125.0,38.4,-121.8,40.9)
occ(query = 'Danaus plexippus', from="inat", geometry=bounds)

## Passing geometry with multiple sources
occ(query = 'Danaus plexippus', from=c("inat","gbif","ecoengine"), geometry=bounds)

## Using geometry only for the query
### A single bounding box
occ(geometry = bounds, from = "gbif", limit=50)
### Many bounding boxes
occ(geometry = list(c(-125.0,38.4,-121.8,40.9), c(-115.0,22.4,-111.8,30.9)), from = "gbif")

## Many geometry and many names
res <- occ(query = c('Danaus plexippus', 'Accipiter striatus'),
   geometry = list(c(-125.0,38.4,-121.8,40.9), c(-115.0,22.4,-111.8,30.9)), from = "bison")
res

## Geometry only with WKT
wkt <- 'POLYGON((-98.9 44.2,-89.1 36.6,-116.7 37.5,-102.5 39.6,-98.9 44.2))'
occ(from = "gbif", geometry = wkt, limit = 10)

# Specify many data sources, another example
ebirdopts = list(region = 'US'); gbifopts  =  list(country = 'US')
out <- occ(query = 'Setophaga caerulescens', from = c('gbif','inat','bison','ebird'),
    gbifopts = gbifopts, ebirdopts = ebirdopts, limit=20)
occ2df(out)

# Pass in many species names, combine just data to a single data.frame, and
# first six rows
spnames <- c('Accipiter striatus', 'Setophaga caerulescens', 'Spinus tristis')
(out <- occ(query = spnames, from = 'gbif', gbifopts = list(hasCoordinate = TRUE), limit=25))
df <- occ2df(out)
head(df)

# no query, geometry, or ids passed
## many dataset keys to gbif
dsets <- c("14f3151a-e95d-493c-a40d-d9938ef62954", "f934f8e2-32ca-46a7-b2f8-b032a4740454")
occ(limit = 20, from = "gbif", gbifopts = list(datasetKey = dsets))
## class name to idigbio
occ(limit = 20, from = "idigbio", idigbioopts = list(rq = list(class = 'arachnida')))
## limit to ecoengine
occ(from = "ecoengine", ecoengineopts = list(limit = 3))

# taxize integration
## You can pass in taxonomic identifiers
library("taxize")
(ids <- get_ids(names=c("Chironomus riparius","Pinus contorta"), db = c('itis','gbif')))
occ(ids = ids[[1]], from='bison', limit=20)
occ(ids = ids, from=c('bison','gbif'), limit=20)

(ids <- get_ids(names="Chironomus riparius", db = 'gbif'))
occ(ids = ids, from='gbif', limit=20)

(ids <- get_gbifid("Chironomus riparius"))
occ(ids = ids, from='gbif', limit=20)

(ids <- get_tsn('Accipiter striatus'))
occ(ids = ids, from='bison', limit=20)

# SpatialPolygons/SpatialPolygonsDataFrame integration
library("sp")
## Single polygon in SpatialPolygons class
one <- Polygon(cbind(c(91,90,90,91), c(30,30,32,30)))
spone = Polygons(list(one), "s1")
sppoly = SpatialPolygons(list(spone), as.integer(1))
out <- occ(geometry = sppoly, limit=50)
out$gbif$data

## Two polygons in SpatialPolygons class
one <- Polygon(cbind(c(-121.0,-117.9,-121.0,-121.0), c(39.4, 37.1, 35.1, 39.4)))
two <- Polygon(cbind(c(-123.0,-121.2,-122.3,-124.5,-123.5,-124.1,-123.0),
                     c(44.8,42.9,41.9,42.6,43.3,44.3,44.8)))
spone = Polygons(list(one), "s1")
sptwo = Polygons(list(two), "s2")
sppoly = SpatialPolygons(list(spone, sptwo), 1:2)
out <- occ(geometry = sppoly, limit=50)
out$gbif$data

## Two polygons in SpatialPolygonsDataFrame class
sppoly_df <- SpatialPolygonsDataFrame(sppoly, data.frame(a=c(1,2), b=c("a","b"), c=c(TRUE,FALSE),
   row.names=row.names(sppoly)))
out <- occ(geometry = sppoly_df, limit=50)
out$gbif$data

# curl debugging
library('httr')
occ(query = 'Accipiter striatus', from = 'gbif', limit=10, callopts=verbose())
# occ(query = 'Accipiter striatus', from = 'ebird', limit=10, callopts=verbose())
occ(query = 'Accipiter striatus', from = 'bison', limit=10, callopts=verbose())
occ(query = 'Accipiter striatus', from = 'ecoengine', limit=10, callopts=verbose())
occ(query = 'Accipiter striatus', from = c('ebird','bison'), limit=10, callopts=verbose())
# occ(query = 'Accipiter striatus', from = 'ebird', limit=10, callopts=timeout(seconds = 0.1))
occ(query = 'Accipiter striatus', from = 'inat', callopts=verbose())
## notice that callopts is ignored when from='antweb'
occ(query = 'linepithema humile', from = 'antweb', callopts=verbose())

########## More thorough data source specific examples
# idigbio
## scientific name search
res <- occ(query = "Acer", from = "idigbio", limit = 5)
res$idigbio

## geo search
### bounding box
bounds <- c(-120, 40, -100, 45)
occ(from = "idigbio", geometry = bounds, limit = 10)
### wkt
# wkt <- 'POLYGON((-69.9 49.2,-69.9 29.0,-123.3 29.0,-123.3 49.2,-69.9 49.2))'
wkt <- 'POLYGON((-98.9 44.2,-89.1 36.6,-116.7 37.5,-102.5 39.6,-98.9 44.2))'
occ(from = "idigbio", geometry = wkt, limit = 10)

## limit fields returned
occ(query = "Acer", from = "idigbio", limit = 5,
   idigbioopts = list(fields = "scientificname"))

## offset and max_items
occ(query = "Acer", from = "idigbio", limit = 5,
   idigbioopts = list(offset = 10))

## sort
occ(query = "Acer", from = "idigbio", limit = 5,
   idigbioopts = list(sort = TRUE))$idigbio
occ(query = "Acer", from = "idigbio", limit = 5,
   idigbioopts = list(sort = FALSE))$idigbio

## more complex queries
### parameters passed to "rq", get combined with the name queried
occ(query = "Acer", from = "idigbio", limit = 5,
   idigbioopts = list(rq = list(basisofrecord="fossilspecimen")))$idigbio

#### NOTE: no support for multipolygons yet
## WKT's are more flexible than bounding box's. You can pass in a WKT with multiple
## polygons like so (you can use POLYGON or MULTIPOLYGON) when specifying more than one
## polygon. Note how each polygon is in it's own set of parentheses.
# occ(query='Accipiter striatus', from='gbif',
#    geometry='MULTIPOLYGON((30 10, 10 20, 20 60, 60 60, 30 10),
#                           (30 10, 10 20, 20 60, 60 60, 30 10))')

Run the code above in your browser using DataLab