Learn R Programming

One Health VBD Hub (ohvbd)

Introduction

ohvbd is an R package for retrieving (and parsing) data from a network of disease vector data sources.

This package was developed as part of the One Health Vector-Borne Diseases Hub.

Databases

ohvbd allows for searching and the retrieval of data from the following data sources:

Installation

You can install the stable version of ohvbd from CRAN:

install.packages("ohvbd")

You can alternatively install the development version of ohvbd from GitHub including any new or experimental features:

# install.packages("devtools")
devtools::install_github("fwimp/ohvbd")

The vignettes are all available online, but if you would like to build them locally, add build_vignettes = TRUE into your install_github() command. However, we do not recommend doing this due to the number of extra R packages utilised in the vignettes.

Basic usage

ohvbd has been designed to make finding and retrieving data on disease vectors simple and straightforward.

Typically it uses a “piped”-style approach to find, get, and filter data from the supported databases, however it aims to provide the data to you “as-is”, leaving further downstream analysis and filtering down to you.

A basic pipeline for finding and retrieving data on Ixodes ricinus from the VecTraits database looks something like this:

library(ohvbd)

df <- search_hub("Ixodes ricinus") |>
  filter_db("vt") |>
  fetch() |>
  glean()

Latest release patch notes

ohvbd 1.0.0

Major API change

  • extract_ functions are now glean_.
    • This means that if tidyverse is loaded after ohvbd, there are no direct namespace collisions.

Full list of function name changes:

  • extract() -> glean()
  • extract_ad() -> glean_ad()
  • extract_gbif() -> glean_gbif()
  • extract_vd() -> glean_vd()
  • extract_vt() -> glean_vt()
  • fetch_extract_vd_chunked() -> fetch_glean_vd_chunked()
  • fetch_extract_vt_chunked() -> fetch_glean_vt_chunked()

New functions & arguments:

  • ohvbd now interfaces with GBIF for occurrence data.
    • New *_gbif functions (e.g. fetch_gbif()) allow for retrieving and extracting data from GBIF.
    • A GBIF account and the rgbif package are required to retrieve data from GBIF.
    • The account details must also be set up as shown in the rgbif documentation.
  • New tee() command allows one to extract data from the middle of a pipeline and save it to an environment.
    • This is definitely not only useful for ohvbd workflows, and can be used in any base R pipeline (|>). It has not been tested in magrittr pipelines but should work as-is.
  • New filter_db() command allows for filtering out of only one database’s results from hub searches.
  • check_db_status() now returns (invisibly) whether all databases are up or not.
  • New fetch_citation() and fetch_citation_* commands provide an interface to attempt to retrieve citations from a vectorbyte dataset.
    • This will (by default) possibly redownload parts or all of the data if the columns are not currently present.
  • New force_db() function enables one to force ohvbd to consider a particular object as having a particular provenance.
  • New simplify argument to search_hub() makes hub searches return an ohvbd.ids object if only one database was searched for. This behaviour is on by default.
    • To match this, filter_db() will now transparently return ohvbd.ids objects if it gets them.
  • New taxonomy argument to search_hub() allows for filtering searches by GBIF backbone IDs.
  • New match_species() function allows for quick and flexible matching of species names to their GBIF backbone IDs.
  • New match_country() function allows for matching of country names to WKT polygons via naturalearth.
  • New ohvbd_db(), has_db(), and is_from() functions allow for quick testing of object provenance (according to ohvbd).
  • New get_default_ohvbd_cache() function allows for custom functions that interface with cached ohvbd data files.
  • New list_ohvbd_cache() and clean_ohvbd_cache() functions enable better interactive cache management.
    • As a result, clean_ad_cache() has been removed as it is now unnecessary.
  • search_x_smart() functions can now take "tags" as a search field, enabling support for tagged datasets.

Other:

  • Entire code base is now continuously formatted using Air v0.7.1.
  • Examples are no longer wrapped in \dontrun{} so they should be runnable from an installed version of the package.
  • A good chunk of the functional logic of ohvbd is now covered with unit tests (using the vcr package).
  • fetch_vd() no longer tries to retrieve ids with no pages of data.
  • Functions that interface with vectorbyte databases no longer recommend using set_ohvbd_compat() as unexpected SSL errors should break pipelines by default.
    • These errors are no longer expected to occur when interfacing with vectorbyte.
  • Running fetch() on an ohvbd.hub.search or glean() on an ohvbd.ids object now provides a hint that you may have forgotten something.
    • Occasionally users would use forget a fetch() command and run search_hub() |> glean() which didn’t previously give an interpretable error.
  • Vignettes now use vcr to massively reduce their build time. This should only matter to developers of ohvbd, or users who download from github and build the vignettes themselves.
  • ohvbd.ids() now warns you and fixes the problem if you provide ids with duplicate values.
  • glean_vt() and glean_vd() now force the inclusion of the dataset ID when filtering columns (using the cols argument).
    • This is intended to encourage you to preserve at least one means of retrieving citation data later.
  • WKT parsing and formatting is now significantly more robust.
  • Cached AREAData now includes the cache timestamp as an attribute rather than a separate variable in the cache file.
  • glean_ad() now correctly returns a matrix even when there is only 1 row or column.
  • gadm spatial files are now cached as GeoPackage rather than shapefiles, leading to a >50% speedup in loading! (Thanks to @josiah.rs on bluesky for the suggestion!)
  • fetch_vd_counts() is now significantly faster, more robust, and temporarily caches data.
    • You will see particular improvements if you are trying to retrieve more than about 10 ids in one go or if you are repeatedly running the same download code in the same day.
    • This speedup also applies to fetch_vd() under the hood, particularly if you are running it multiple times in a day.
  • Explicit term checking (such as in fetch_ad() for metrics and search_vt_smart() for operators and fields) is now fuzzy, allowing for a small amount of deviation from the actual term name.
  • assoc_ad() now tries to guess LatLong column names if none (or the wrong ones) are provided.
  • Errors in internal functions now make it more clear which user-facing functions they originate from.
  • Multiple functions now default to NULL rather than NA for default missing values (except date arguments to AD-related functions, where NA is more reasonable in the grand scheme).
  • fetch_ad() now caches and tries to read from cache by default.
    • Generally speaking unless exceedingly up-to-date data is required, this will be the best for most people.
    • If you do require guaranteed new data, it’s worth setting refresh_cache = TRUE or use_cache = FALSE (depending on if you want to replace your existing cache or not).
  • All downloaders that can potentially cache data also attach the download time if not loading from cache.

See changelog for patch notes for all versions.

Copy Link

Version

Install

install.packages('ohvbd')

Monthly Downloads

496

Version

1.0.0

License

GPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Francis Windram

Last Published

February 9th, 2026

Functions in ohvbd (1.0.0)

fetch_glean_vd_chunked

Fetch and parse multiple VecDyn datasets by ID in chunks
fetch_glean_vt_chunked

Get and parse multiple VecTraits datasets by ID in chunks
find_vt_ids

Get current IDs in VecTraits
fetch_citations_vt

Retrieve citations for vectraits data
find_vd_missing

Find the ids of any resps that contain a count of 0 list of them
format_time_overlap_bar

Format and print date overlaps
get_default_ohvbd_cache

Get ohvbd cache locations
glean_ad

Extract data from AREAdata datasets
generate_vt_template

Generate a vectraits template from a short set of survey responses
glean_vd

Parse data from requests to VecDyn
is_from

Test whether an object is considered to be from a particular database
fetch_vd

Fetch VecDyn dataset/s by ID
is_cached

Check whether an object has been loaded from cache by ohvbd
glean

Extract specified data from a set of responses
get_curl_err

Extract curl errors from httr2 error objects
fetch_vd_counts

Fetch VecDyn dataset length by ID
glean_gbif

Parse data from requests to GBIF
has_db

Test whether an object has provenance information
search_vd

Search VecDyn by keyword
search_hub

Search vbdhub.org
glean_vt

Parse data from requests to VecTraits
ohvbd-package

ohvbd: One Health VBD Hub
ohvbd_db

Database provenance
ohvbd_dryrun

Option: dry runs of ohvbd searches
match_species

Match species names to their GBIF backbone ids
find_vd_ids

Get current IDs in VecDyn
list_ohvbd_cache

List all ohvbd cached files
find_vd_columns

Get current columns in VecDyn datasets
match_countries

Match country names to their equivalent naturalearth WKT polygons
ohvbd.ids

Create a new ohvbd ID vector
progress_style

Reimplementation of cli's progress style shim
read_ad_cache

Read AREAdata from cache file
ohvbd_attrs

Internal attributes
search_vd_smart

Search VecDyn using the explorer's filters
vb_basereq

Generate a base request object for the vectorbyte databases
search_vt

Search VecTraits by keyword
set_ohvbd_compat

Set ohvbd compatability mode to TRUE
search_vt_smart

Search VecTraits using the explorer's filters
set_default_ohvbd_cache

Set the default ohvbd cache location
write_ad_cache

write data from AREAdata to cache file
vt_error_body

Retrieve and format error message from failed vt calls
vd_error_body

Retrieve and format error message from failed vd calls
vd_make_req

Create a query for a given VD id at a given page
spatvect_to_multipolygon

Encode spatvector as WKT, and convert to multipolygon if needed
wkt_to_multipolygon

wkt_to_multipolygon
tee

Tee a pipeline to extract the data at a given point
vd_extraction_helper

Extract a single vd response, including consistent data
space_collapse

collapse a list of character strings to a JS space-separated single string
ad_basereq

Generate a base request string for the AREAdata database
check_ohvbd_config

Print current ohvbd configuration variables
assoc_gadm

Associate other data sources with gadm IDs
clean_ohvbd_cache

Delete files from ohvbd cache directories
clean_ad_cache

Delete all rda files from ohvbd AREAdata cache (Deprecated)
check_db_status

Check whether databases are currently online
assoc_ad

Associate other data sources with AREAdata data
check_provenance

Check whether an object is from a given database and complete appropriate messaging
extract

Extract specified data from a set of responses (Deprecated)
fetch

Fetch specified data from a set of ids
fetch_citations_vd

Retrieve citations for vecdyn data
coercedate

Try to coerce a date even when bits are missing
.match_term

Fuzzy match a term (case-insensitive) to a list of final terms through a translation enum.
convert_place_togid

Convert a vector of place names to their equivalent at a different gid level
default_progress_style

Reimplementation of cli's default progress style
.get_vb_req_id

Extract request ids from httr2 response objects
fetch_vt

Fetch VecTraits dataset/s by ID
fetch_vd_meta

Fetch VecDyn metadata table
fetch_ad

Fetch AREAdata dataset
find_vb_404s

Find the ids of any resps that contain 404 errors from a list of them
force_db

Force an object to appear to come from a specific database
filter_db

Filter hub search results by database
force_multipolygon

Force a polygon WKT into multipolygon form
fetch_gbif

Fetch GBIF dataset/s by ID
fetch_citations

Try to find the relevant citations for a dataset
fetch_gadm_sfs

Fetch gadm mapping shapefiles
.format_dir_as_tree

Format directory as df ready for tree plotting