Learn R Programming

BOLDconnectR is a package designed for retrieval, transformation and analysis of the data available in the Barcode Of Life Data Systems (BOLD) database. This package provides the functionality to obtain public and private user data available in the database in the Barcode Core Data Model (BCDM) format. Data include information on the taxonomy,geography,collection,identification and DNA barcode sequence of every submission. The manual is currently hosted here (https://github.com/boldsystems-central/BOLDconnectR_examples/blob/main/BOLDconnectR_1.0.0.pdf)

BOLDconnectR requires R version 4.0 or above to function properly. The versions of dependent packages have also been set such that they would work with R >= 4.0. In addition, there are a few suggested packages that are not mandatory for the package to download and install properly, but, are essential for a couple of functions to work. The names and exact versions of the dependencies/suggestions are given here (https://github.com/boldsystems-central/BOLDconnectR/blob/main/DESCRIPTION). More details on Suggested packages provided below.

Installation

The package can be installed using devtools::install_github function from the devtools package in R (which needs to be installed before installing BOLDConnectR).


devtools::install_github("https://github.com/boldsystems-central/BOLDconnectR")
library(BOLDconnectR)

BOLDconnectR has 11 functions currently:

  1. bold.fields.info
  2. bold.apikey
  3. bold.fetch
  4. bold.public.search
  5. bold.full.search
  6. bold.data.summarize
  7. bold.analyze.align
  8. bold.analyze.tree
  9. bold.analyze.diversity
  10. bold.analyze.map
  11. bold.export

Note on Suggested packages Function 6: bold.data.summarize requires the packages Biostrings to be installed and imported in R session beforehand for generating the barcode_summary. Function 7: bold.analyze.align requires the packages msa and Biostrings to be installed and imported in the R session beforehand. Function 8 also uses the output generated from function 7. msa and Biostrings can be installed using the BiocManager package.


if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")

BiocManager::install("msa")
BiocManager::install("Biostrings")

library(msa)
library(Biostrings)

Note on API key

The function bold.fetch requires an api key internally to access and download all public + private user data. The API key needed to retrieve BOLD records is found in the BOLD ‘Workbench’ https://bench.boldsystems.org/index.php/Login/page?destination=MAS_Management_UserConsole. After logging in, navigate to ‘Your Name’ (located at the top left-hand side of the window) and click ‘Edit User Preferences’. You can find the API key in the ‘User Data’ section. Please note that to have an API key available in the workbench, a user must have uploaded at least 10,000 records to BOLD. API key can be saved in the R session using bold.apikey() function. Please note that the API keys are regenerated periodically and will be updated in the user's workbench account. Using old keys will result in a HTTP 401 error.

# Substitute ‘00000000-0000-0000-0000-000000000000’ with your key
# bold.apikey(‘00000000-0000-0000-0000-000000000000’)

Basic usage of BOLDConnectR functionality

API key function must be run prior to using the fetch function (Please see above).

Fetch data

BCDM_data<-bold.fetch(get_by = "processid",
                      identifiers = test.data$processid)
#> Initiating download
#>  Downloading data in a single batch  
#> Download complete & BCDM dataframe generated

knitr::kable(head(BCDM_data,4))
processidrecord_idinsdc_acssampleidspecimenidtaxidshort_noteidentification_methodmuseumidfieldidcollection_codeprocessid_minted_dateinstfunding_srcsexlife_stagereproductionhabitatcollectorssite_codespecimen_linkoutcollection_event_idsampling_protocoltissue_typecollection_date_startcollection_timeassociated_taxaassociated_specimensvoucher_typenotestaxonomy_notescollection_notesgeoidmarker_codekingdomphylumclassorderfamilysubfamilytribegenusspeciessubspeciesidentificationidentification_rankspecies_referenceidentified_bysequence_run_sitenucnuc_basecountsequence_upload_datebin_uribin_created_dateelevdepthcoordcoord_sourcecoord_accuracyelev_accuracydepth_accuracyrealmbiomeecoregionregionsectorsitecountry_isocountry.oceanprovince.statebold_recordset_code_arrcollection_date_end
SSWLD6460-13SSWLD6460-13.COI-5PKM825932BIOUG06662-C0134357159199Waterton Lakes NPBOLD ID Engine: top hitsBIOUG06662-C01L#12BIOBUS-1587BIOUG2013-07-04Centre for Biodiversity GenomicsiBOL:WG1.9NANANAForestBIOBus 2012BIOUG:WATERTON-NP:2NANASweep NetWhole Voucher2012-08-08NANANAVouchered:Registered CollectionNANA5 min sweep x4 collectors (2)|Sunny with slight haze, 23C|montane forest, douglas fir and lodgepole pine stand with aspen and birch understory533COI-5PAnimaliaArthropodaArachnidaAraneaeSalticidaeNANAErisEris militarisNAEris militarisspecies(Hentz, 1845)Monica R. YoungCentre for Biodiversity GenomicsAACGTTATATTTAATTTTTGGAGCTTGATCAGCTATAGTTGGTACTGCTATAAGAGTATTAATTCGAATAGAATTAGGACAAACT—GGATCATTTTTAGGT————AATGATCATATATATAATGTAATTGTAACTGCTCATGCTTTTGTAATGATTTTTTTTATAGTAATACCAATTATAATTGGGGGATTTGGTAATTGGTTAGTTCCTTTAATGTTAGGGGCTCCGGATATAGCTTTTCCTCGAATAAATAATTTAAGTTTTTGATTATTACCTCCTTCTTTATTTTTATTGTTTATTTCTTCTATAGCTGAAATAGGGGTT—GGAGCTGGATGAACAGTATATCCTCCTTTGGCATCTATTGTTGGACATAATGGTAGATCAGTAGATTTTGCTATTTTTTCTTTACATTTAGCTGGTGCTTCATCAATTATAGGAGCTATTAATTTTATTTCTACTATTATTAATATACGA—TCAGTAGGAATATCTTTAGATAAAATTCCTTTATTTGTTTGATCTGTAATAATTACTGCTGTATTATTATTGTTATCATTACCTGTTTTAGCAGGAGCTATTACTATATTATTAACTGAT———————5892013-09-16BOLD:AAA56542010-07-151562NA49.065,-113.778GPSmap 60CxNANANANearcticNANorthern_Rockies_conifer_forestsWaterton Lakes National ParkEast of 2 Flags LookoutHighway 6 pulloffCACanadaAlbertaSSWLD,DS-MOB113,DS-BICNP02,DS-SOC2014,DS-ARANCCYH,DATASET-BBWLNP1,DS-SPCANADA,DS-JUMPGLOB,DS-MOB112,DS-CANSSNA
SPITO327-14SPITO327-14.COI-5PKP654265BIOUG12602-G1146101969162NANABIOUG12602-G11L#14BLITZ-001BIOUG2014-06-10Centre for Biodiversity GenomicsiBOL:WG1.9MASNAGergin BlagoevNANANAFree Hand CollectionNA2014-05-24NANANAmuseum voucherCollected May 24-14, as part of Humber Watershed BioBlitzNANA528COI-5PAnimaliaArthropodaArachnidaAraneaeSalticidaeNANAPhidippusPhidippus audaxNAPhidippus audaxspecies(Hentz, 1845)Gergin A. BlagoevCentre for Biodiversity Genomics-ACATTATATTTGATTTTTGGAGCTTGGGCTGCAATAGTTGGTACTGCAATA—AGTGTATTGATTCGAATAGAATTGGGTCAAACTGGATCATTTATAGGAAAT—GATCATATATATAATGTAATTGTGACTGCTCATGCTTTTGTTATAATTTTTTTTATAGTAATACCTATTATGATTGGAGGATTTGGAAACTGATTAGTTCCTTTAATA—TTAGGTGCTCCTGATATGGCTTTTCCTCGTATAAATAATTTGAGATTTTGATTATTACCCCCTTCTTTATTTTTATTATTTATTTCTTCCATAGCTGAGGTAGGTGTAGGGGCTGGTTGGACAGTTTATCCACCTTTGGCCTCTATTGTTGGGCATAATGGAAGATCAGTAGATTTT—GCTATTTTTTCATTACATTTAGCTGGTGCTTCATCAATTATAGGAGCTATTAATTTTATTTCTACAATTATTAATATACGTTCTTTAGGAATGTCTTTAGATAAAATTCCTTTGTTTGTTTGATCTGTAATAATTACTGCAGTTTTGTTATTACTTTCTCTTCCTGTATTAGCTGGG—GCTATTACTATATTGTTGACTGAT——————————————————————————————————————————————————————————————————————————————————————-5882014-06-27BOLD:AAC68912010-07-15380NA43.933,-79.928NANANANANearcticNAEastern_Great_Lakes_lowland_forestsHumber WatershedNAGlen Haffy Conservation AreaCACanadaOntarioSPITO,DS-SOC2014,DS-ARANCCYH,DS-TMPSRCH,DS-SPCANADA,DS-OLOCC2,DS-JUMPGLOBNA
ARONT071-09ARONT071-09.COI-5PGU68283609ONTGAB-183122996630494SPIOH09-1 F11NA09ONTGAB-183090816FHBIOUG2009-09-23Centre for Biodiversity GenomicsiBOL:WG1.9MISNAG.A.BlagoevNANANANA2009-08-16NANANANANANA528COI-5PAnimaliaArthropodaArachnidaAraneaeSalticidaeNANANaphrysNaphrys pulexNANaphrys pulexspecies(Hentz, 1846)Gergin A. BlagoevCentre for Biodiversity GenomicsAACATTATATTTGATTTTTGGTGCTTGATCAGCTATAGTAGGTACGGCTATAAGAGTTTTGATTCGAATAGAGTTGGGACAGACTGGTAATTTTTTGGGAAATGATCATTTATATAATGTCATTGTAACTGCTCATGCTTTTGTTATGATTTTTTTTATAGTAATACCTATTTTGATTGGTGGTTTTGGTAATTGATTAGTGCCATTAATATTAGGGGCTCCTGATATAGCTTTTCCTCGGATGAATAATTTGAGATTTTGGTTATTACCCCCTTCATTAATACTCTTATTTATATCTTCAATAGTGGAGATAGGGGTAGGAGCAGGGTGAACAGTGTATCCCCCATTAGCTTCTGTTGTAGGTCATAATGGAAGATCTGTTGATTTTGCTATTTTTTCTTTACATTTAGCGGGGGCTTCTTCTATTATAGGAGCAGTTAATTTTATTTCTACTATTATTAATATACGTGTATTAGGAATGAGAATAGATAAGATTCCTTTGTTTGTTTGGTCAGTTGGGATTACTGCTGTATTATTATTATTATCACTACCAGTGTTGGCTGGTGCTATTACAATATTGTTGACTGATCGTAATTTTAATACCTCTTTTTTTGATCCTGCGGGAGGAGGGGATCCGGTTTTGTTTCAGCATTTATTT6582009-10-29BOLD:AAC24332010-07-15300NA43.691,-80.414NANANANearcticNAEastern_Great_Lakes_lowland_forestsWellington Co.EloraBeachCACanadaOntarioARONT,DS-MOB113,DS-SOC2014,DS-MYBCA,DS-ARANCCYH,DS-SPCANADA,DS-OLOCC1,DS-JUMPGLOB,DS-MOB112,DS-JALPHANA
SPIRU1237-11SPIRU1237-11.COI-5PKF368796BIOUG00629-G031982513842900ocean beach|AP|HCNABIOUG00629-G03L#10PROBE-651010PROBE2011-05-16Centre for Biodiversity GenomicsiBOL:WG1.10ISNAV. JuneaBIOUG:ChurchillNANANANA2010-07-30NANANAwhole specimenNANANA531COI-5PAnimaliaArthropodaArachnidaAraneaeSalticidaeNANASittisaxSittisax ranieriNASittisax ranierispecies(G. W. Peckham & E. G. Peckham, 1909)Gergin A. BlagoevCentre for Biodiversity GenomicsTACGTTATATTTAGTTTTTGGAGCTTGGTCTGCTATAGTTGGTACGGCTATAAGAGTTTTAATTCGTATAGAATTAGGTCAAACTGGTCATTTTTTAGGAAATGATCATTTGTATAATGTAATTGTTACTGCACATGCATTTGTTATAATTTTTTTTATAGTAATACCTATTTTGATTGGAGGTTTTGGTAATTGATTAGTCCCTCTAATGTTAGGAGCTCCGGATATAGCTTTTCCTCGTATAAATAATTTAAGTTTTTGATTATTACCTCCTTCATTATTTTTATTATTTATTTCATCTATAGCTGAGATAGGAGTAGGGGCAGGGTGAACTGTTTATCCTCCATTAGCTTCTATTGTAGGTCATAATGGAAGTTCGGTAGATTTTGCTATTTTTTCTCTTCATTTGGCTGGGGCTTCATCAATTATAGGTGCTATTAATTTTATTTCAACTGTTATTAATATACGATCGGTGGGTATATCAATAGATAAGATTCCATTGTTTGTTTGGTCTGTTGTAATTACTGCTGTATTATTGTTATTGTCTTTACCTGTTTTAGCGGGTGCAATTACTATGCTATTGACTGATCGAAATTTTAATACGTCTTTTTTTGATCCTGCTGGAGGAGGGGATCCAATTTTATTTCAACATTTATTT6582012-11-23BOLD:AAC20612010-07-15NANA58.772,-93.843GPS WGS84NANANANearcticNASouthern_Hudson_Bay_taigaChurchill16 km E Churchill, Bird Cove, Rock Bluff ABeachCACanadaManitobaCHSPI,DATASET-CHURCH12,DS-MOB113,DS-SOC2014,DS-ARANCCYH,DS-TMPSRCH,DS-SPCANADA,DS-ATBIB,DS-JUMPGLOB,DS-MOB112,DS-ARA43210NA

Similarly, sampleids or dataset_codes or project_codes can also be used to fetch data. The data can also be filtered on different parameters such as Geography, Attributions and DNA Sequence information using the _filt arguments available in the function

Summarize downloaded data

Downloaded data can then be summarized in different ways. Options currently include a concise summary of all the data, detailed taxonomic counts, data completeness and a barcode-based summary

BCDM_data_summary<-bold.data.summarize(bold_df = BCDM_data,
                               summary_type = "concise_summary")

BCDM_data_summary$concise_summary
#>                        Category   Value
#> 1                 Total_records    1336
#> 2     Total_records_w_sequences    1336
#> 3                Unique_species      80
#> 4                   Unique_BINs     117
#> 5              Unique_countries       1
#> 6             Unique_institutes       6
#> 7          Unique_identified_by       7
#> 8  Unique_specimen_depositories       3
#> 9                Unique_markers  COI-5P
#> 10        Amplicon_length_range 508-658

A concise summary providing a high level overview of the data

Export the downloaded data

Downloaded data can also be exported to the local machine either as a flat file or as a FASTA file for any third party sequence analysis tools.The flat file contents can be modified as per user requirements (entire data or specific presets or individual fields).

# Preset dataframe
# bold.export(bold_df = BCDM_data,
#             export_type = "preset_df",
#             presets = 'taxonomy',
#             cols_for_fas_names = NULL,
#             export = "file_path_with_intended_name.csv")

# Unaligned fasta file
# bold.export(bold_df = BCDM_data,
#             export_type = "fas",
#             cols_for_fas_names = c("bin_uri","genus","species"),
#             export = "file_path_with_intended_name.fas")

Other functions

The package also has functions that provide sequence alignment, NJ clustering, biodiversity analysis and occurrence mapping using the downloaded BCDM data. Additionally, some of these functions also output objects that are commonly used by other R packages (Ex. ‘sf’ dataframe, occurrence matrix for ‘vegan’ and ‘betapart’). Please go through the help manual (Link provided above) for detailed usage of all the functions of BOLDConnectR with examples.

BOLDconnectR can retrieve data very fast (~100k records in a minute on a fast wired connection).

Citation: Padhye SM, Ballesteros-Mejia CL, Agda TJA, Agda JRA, Ratnasingham S. BOLDconnectR: An R package for streamlined retrieval, transformation and analysis of BOLD DNA barcode data (MS in prep).

Copy Link

Version

Install

install.packages('BOLDconnectR')

Monthly Downloads

279

Version

1.0.0

License

MIT + file LICENSE

Maintainer

Sameer Padhye

Last Published

September 17th, 2025

Functions in BOLDconnectR (1.0.0)

presets

Define and select column presets from the BCDM data
taxon_hierarchy_count

Helper functions for creating different data summaries
test.data

Canadian spider data by Blagoev et al.(2015)
bold.fetch.filters

Filters for specific parameters to customize the search for private data
bold.analyze.diversity

Create a biodiversity profile of the retrieved data
bold.analyze.tree

Analyze and visualize the multiple sequence alignment
bold.export

Export files generated by BOLDconnectR
bold.analyze.align

Transform and align the sequence data retrieved from BOLD
bold.fetch

Retrieve data from the BOLD database
bold.apikey

Set the BOLD private data API key
bold.analyze.map

Visualize BIN occurrence data on maps
bold.data.summarize

Generate specific summaries from the downloaded BCDM data
base_url_parse

Helper functions for bold.public.search
bold.full.search

Search user based (private) and publicly available data on the BOLD database
id.files

Helper Functions for fetching data
bold.fields.info

Retrieve metadata of the BOLD data fields
gen.msa.res

Helper functions for sequence based data
post.api.res.fetch

Helper Functions: Retrieve the data using POST API and convert it into a data frame
richness_profile

Helper functions for generating a diversity profile
bold.public.search

Search publicly available data on the BOLD database
globals

global variables
gen.comm.mat

Create a community matrix based on BINs abundances/incidences