Learn R Programming

BOLDconnectR (version 1.0.0)

bold.public.search: Search publicly available data on the BOLD database

Description

Retrieves record ids for publicly available data based on taxonomy, geography, institutes, bin_uris or datasets/project codes search.

Usage

bold.public.search(
  taxonomy = NULL,
  geography = NULL,
  bins = NULL,
  institutes = NULL,
  dataset_codes = NULL,
  project_codes = NULL
)

Value

A data frame containing all the processids and marker codes related to the query search.

Arguments

taxonomy

A list of single or multiple characters specifying the taxonomic names at any hierarchical level. Default value is NULL.

geography

A list of single or multiple characters specifying any of the country/province/state/region/sector/site names/codes. Default value is NULL.

bins

A list of single or multiple characters specifying the BIN ids. Default value is NULL.

institutes

A list of single or multiple characters specifying the institutes. Default value is NULL.

dataset_codes

A list of single or multiple characters specifying the dataset codes. Default value is NULL.

project_codes

A list of single or multiple characters specifying the project codes. Default value is NULL.

Details

bold.public.search searches publicly available data on BOLD, retrieving associated proccessids and marker codes. All the BCDM data can then be retrieved using the processids as inputs for the bold.fetch function. Search parameters can include one or a combination of taxonomy, geography, bin uris, dataset or project codes. Each input should be provided as a separate list (Ex. taxonomy = list("Panthera", "Poecilia"), geography = list("India)). A dataframe column can also be used as an input using the '$' operator (e.g., df$column_name). If this is the case (i.e. df$column_name), as.list should be used instead of just list (Ex. taxonomy = as.list (df$column_name), geography = as.list(df$column_name)). The character length of a search query should also be considered as the function wont be able to retrieve records if that exceeds the predetermined web URL character length (2048 characters). For multi-parameter searches (e.g. taxonomy + geography + bins; see the example: Taxonomy + Geography + BIN id), it’s important to logically combine the parameters to ensure accurate and non-empty results. Misspelled queries or those for which no public data exists on BOLD at the time the function is executed will result in an error. This applies for any of the search parameters. There is a hard limit of 1 million record downloads for each search. Download speeds for very large requests for bin_uris, dataset_codes and project_codes will be throttled, resulting in more time for fetching the data. Download speed would also depend on the user’s internet connection and computer specifications.

Examples

Run this code
# \donttest{

#Taxonomy
bold.data <- bold.public.search(taxonomy = list("Panthera leo"))

#Result
head(bold.data,10)

#Taxonomy and Geography
bold.data.taxo.geo <- bold.public.search(taxonomy = list("Panthera uncia"),
geography = list("India"))

#Result
head(bold.data.taxo.geo,10)

# Input as a dataframe column
df_test<-data.frame(taxon_name=c("Panthera uncia"),
locations = c("India","Sri Lanka"))

# Result
bold.data.taxo.geo.df.col <- bold.public.search(taxonomy = as.list(df_test$taxon_name),
geography = as.list(df_test$locations))

# }

Run the code above in your browser using DataLab