Learn R Programming

biosurvey: Tools for Biological Survey Planning

Claudia Nunez-Penichet, Marlon E. Cobos, Jorge Soberon, Tomer Gueta, Narayani Barve, Vijay Barve, Adolfo G. Navarro-Siguenza, A. Townsend Peterson

This repository is for the project “Biological Survey Planning Considering Hutchinson’s Duality” developed during the program GSoC 2020 (see details at the end).

Package description

The biosurvey R package implements multiple tools to select sampling sites for biodiversity inventory, increasing effectiveness by considering the relationship of environmental and geographic conditions in a region. The functions are grouped in three main modules: 1) Data preparation; 2) Selection of sampling sites; and, 3) Tools for testing effectiveness (Fig. 1). Data are prepared in ways that avoid the need for more data in posterior analyses and allow users to focus on critical methodological decisions to select sampling sites. Various algorithms for selecting sampling sites are available, and options for considering pre-selected sites (known to be important for biodiversity monitoring) are included. Visualization is a critical component in this set of tools and most of the results obtained can be plotted to help users to understand their implications. The options for selecting sampling sites included here differ from other implementations in that they consider the environmental and geographic structure of a region to suggest sampling sites that could increase the effectiveness of efforts dedicated to inventor biodiversity.

Installing the package

Note: Internet connection is required to install the package.

To install the latest release of biosurvey use the following line of code:

# Installing from CRAN
install.packages("biosurvey")

The development version of biosurvey can be installed using the code below.

# Installing and loading packages
if(!require(remotes)){
  install.packages("remotes")
}

# To install the package use
remotes::install_github("claununez/biosurvey")

# To install the package and its vignettes use (if needed use: force = TRUE)  
remotes::install_github("claununez/biosurvey", build_vignettes = TRUE)

If you have any problems during installation of the development version from GitHub, restart R session, close other RStudio sessions you may have open, and try again. If during the installation you are asked to update packages, do so if you don’t need a specific version of one or more of the packages to be installed. If any of the packages give an error when updating, please install it alone using install.packages(), then try installing biosurvey again.

To load the package use:

# Load biosurvey
library(biosurvey)

biosurvey functions and vignettes

To check all functions in the package use:

help(biosurvey)

If the package was installed with its vignettes you can see all options with:

vignette(package = "biosurvey")

To check vignettes you can use:

  • vignette("biosurvey_preparing_data").- For a guide on how to prepare data for analysis.
  • vignette("biosurvey_selecting_sites").- For a guide on how to select sampling sites.
  • vignette("biosurvey_selection_with_preselected_sites").- For a guide on how to select sampling sites when some sites have been preselected.
  • vignette("biosurvey_testing_module").- For a guide on how to use the testing module.

Workflow description

Initial data

As shown in Fig. 1, to use biosurvey and select sites for biodiversity inventory you need:

  • Environmental variables.- These variables must be in raster format (e.g., GTiff, BIL, ASCII). To load these variables to your R environment, you can use the function stack from the package raster.
  • Region of interest.- As your analyses will be focused on a region, a spatial polygon of such an area is needed. Common formats in which your polygon can be are Shapefile, GeoPackage, GeoJSON, etc. To load this information to your R environment you can use the function readOGR from the rgdal package.

Additionally, other data can be used to make sampling site selection more effective. The functions that help to prepare the data for analysis also allow users to include:

  • Sites selected a priori.- A data.frame of sites that researchers consider important to be included in any set of localities to be sampled. These site(s) will be included (by force) in any of the results obtained in later analyses. Using these sites represents a good opportunity to consider areas that are well known and should be included to monitor biodiversity changes in a region.
  • A mask to restrict analyses to smaller areas.- A spatial polygon that reduces the region of interest to areas that are considered to be more relevant for analysis can be considered. Some examples of how to define these masks include areas with natural vegetation, areas that are accessible, regions with particular vegetation cover types, etc. This mask is usually in the same format that the Region of interest.

If enough, good-quality data on species distributions are available, analyses of the effectiveness of sampling sites can be performed. The data used to prepare information to perform such analyses can be of different types:

  • Spatial polygons of species distributions
  • Raster layers defining suitable and unsuitable areas
  • Geographic points of species occurrences

Data preparation

To use biosurvey efficiently the first thing to do is to prepare an object containing all information to be used in following analyses. This can be done using the function prepare_master_matrix(). After that, the function make_blocks() can be used to partition the environmental space of the region of interest. To explore how your data looks like, the functions explore_data_EG() and plot_blocks_EG() can be used.

Selection of sites for biodiversity inventory

After preparing data, distinct functions can be used to select sampling sites:

  • random_selection().- Random selection of sites to be sampled in a survey.
  • uniformG_selection().- Selection of sites with the goal of maximizing uniformity of points in geographic space.
  • uniformE_selection().- Selection of sites with the goal of maximizing uniformity of points in environmental space.
  • EG_selection().- Selection of sites with the goal of maximizing uniformity of points in environment, but considering geographic patterns of data.

See also how your selected sites look like with the functions plot_sites_EG(), plot_sites_E(), and plot_sites_G().

Evaluation of sampling site effectiveness

After the selection of sampling sites, and, if enough high-quality data are available, functions from the testing module of this package can be used to explore which sets of sites selected could be better. Explore the following functions to prepare your data, and assess how well your selected sites perform in representing the exiting biodiversity:

  • prepare_base_PAM().- Prepares a presence-absence matrix (PAM) from species distributional data; all sites (rows) will have a value for presence or absence of species (columns).
  • PAM_indices().- Calculates a set of biodiversity indices using values contained in the presence-absence matrix.
  • plot_PAM_geo().- Plot of PAM indices in geography.
  • subset_PAM().- Subsets a base_PAM object according to sites selected previously that are contained in a master_selection object.
  • selected_sites_SAC().- Creates species accumulation curves for each set of selected sites contained in elements of PAM_subset.
  • plot_SAC().- Creates species accumulation curve plots for selected sites.
  • compare_SAC().- Creates comparative plots of two species accumulation curves from information contained in lists obtained with the function selected_sites_SAC().
  • selected_sites_DI().- Computes dissimilarity indices among sites selected and among sets of selected sites, based on the communities of species represented in such units.
  • plot_DI().- Creates matrix-like plots of dissimilarities found among communities of species in distinct sites selected or sets of sites selected.
  • DI_dendrogram().- Plot dissimilarities withing and among sets of selected sites as a dendrogram.

GSoC project description

Student: Claudia Nuñez-Penichet

GSoC Mentors: Narayani Barve, Vijay Barve, Tomer Gueta

Complete list of authors: Claudia Nunez-Penichet, Marlon E. Cobos, Jorge Soberon, Tomer Gueta, Narayani Barve, Vijay Barve, Adolfo G. Navarro-Siguenza, A. Townsend Peterson

Motivation:

Given the increasing intensity of threats to biodiversity in the world, one of the challenges in biodiversity conservation is to complete inventories of existing species at distinct scales. Species distributions depend on the relationships between accessible areas, environmental conditions, and biotic interactions. As planning a survey system only aims to register species in a region, biodiversity interaction can be overlooked in this case. However, the relationship between environmental conditions and the geographic configuration of an area is of crucial importance when trying to identify key sites for biodiversity surveys. Among the diverse packages in R for selecting survey sites, such considerations are not implemented and are limited to a random selection of sampling sites or analyses that allow detecting potential sampling sites based on the environmental similarity between sampled and unsampled areas. Given the need for more solutions, the biosurvey package aimed for considering the relationship between environmental and geographic conditions in a region when designing survey systems that allow sampling of most of its biodiversity.

Status of the project

At the moment we have completed the three main modules of the package. We have made modifications to the original list of products, which have helped us to improve the package functionality. The package is fully functional and available on CRAN.

All commits made can be seen at the complete list of commits.

Copy Link

Version

Install

install.packages('biosurvey')

Monthly Downloads

94

Version

0.1.1

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Claudia Nu<c3><b1>ez-Penichet

Last Published

September 15th, 2021

Functions in biosurvey (0.1.1)

PAM_subset

Constructor of S3 objects of class PAM_subset
PAM_from_table

Creates presence-absence matrix from a data.frame
EG_selection

Selection of survey sites maximizing uniformity in environmental space considering geographic structure
b_pam

Example of object obtained from using the function base_PAM
base_PAM

Constructor of S3 objects of class base_PAM
PAM_indices

Biodiversity indices derived from PAM
assign_blocks

Helper to assign block numbers to data according to variables and limits
PAM_CS

Constructor of S3 objects of class PAM_CS
DI_dendrogram

Plot dissimilarities withing and among sets of selected sites as a dendrogram
find_clusters

Detection of clusters in 2D spaces
compare_SAC

Comparative plots of species accumulation curves
block_sample

Selection of blocks in environmental space
closest_to_centroid

Detection of the closest points to the centroid of a cloud of points
biosurvey

biosurvey: Tools for Biological Survey Planning
dis_loop

Helper to calculate dissimilarities in loop
match_rformat

Helper function to find raster extension
find_modes

Find modes in a multimodal distribution
mx

Example of spatial polygon for a region of interest
master_selection

Constructor of S3 objects of class master_selection
master_matrix

Constructor of S3 objects of class master_matrix
plot_blocks_EG

Representation of environmental blocks in geography and environment
plot_sites_EG

Representation of sites selected to be surveyed
plot_PAM_CS

Representations of diversity and dispersion indices
plot_DI

Plotting dissimilarity indices withing and among sets of selected sites
explore_data_EG

Plots to explore environmental factors in environmental and geographic space
grid_from_region

Creates grid for a given geographic region
files_2data

Creates a data.frame of species' references from files in a directory
legend_bar

Helper to add a bar image legend to plots
distance_filter

Helper to filter sets of sites by median distance among all points
make_blocks

Creates a block-like regionalization of environmental space
m_selection

Example of a master_selection object from using functions for selecting sites
preselected_dist_mask

Helper to create objects to detect points to close to preselected sites
dist_list

A list of vectors of distances
preselected

Example of a data.frame of preselected sites
random_selection

Random selection of survey sites
sp_occurrences

Occurrence records for the species Parides gundlachianus
refill_PAM_indices

Helper to refill a list of PAM indices with new or more results
spdf_2data

Creates a data.frame of species' references from SpatialPolygonsDataFrame
summary

Summary of attributes and results
subset_PAM

Subset PAM according to selected sites
unimodal_test

Unimodality test for list of one or multiple sets of values
variables

Example of variables to be used for preparing a master matrix
selected_sites_DI

Dissimilarity indices from PAM_subset
prepare_PAM_CS

Preparing data for new range-diversity plot
selected_sites_SAC

Species accumulation curves from PAM_subset
selected_sites_PAM

Helper to subset PAM according to selected sites
point_thinning

Helps in thinning points either in geographic or environmental space
rlist_2data

Creates a data.frame of species' references from a list of raster layers
point_sample

Sample points from a 2D environmental space
species_data

Example of species ranges as SpatialPolygonsDataFrame
prepare_base_PAM

Presence-absence matrix (PAM) linked to a spatial grid
point_sample_cluster

Sample points from a 2D environmental space potentially disjoint in geography
prepare_master_matrix

Prepare a base object to perform further analyses
uniformE_selection

Selection of survey sites maximizing uniformity in environmental space
stack_2data

Creates a data.frame of species' references from RasterStack
uniformG_selection

Selection of survey sites maximizing uniformity in geography
m_matrix_pre

Example of a master_matrix object containing preselected sites
plot_PAM_geo

Plot of PAM indices in geography
m_matrix

Example of a master_matrix object with no preselected sites
plot_SAC

Plotting lists of species accumulation curves
sp_layers

Example of stack of layers of suitable and unsuitable conditions for species
purplow

Simple color palettes
print

Print a short version of elements in master and base objects
sp_data

Example of a data.frame of species' found in distinct positions
wgs84_2aed_laea

Project spatial points from geographic coordinates