hcr_resampling: Heterogeneity-constrained random resampling (HCR)

Description

Performs heterogeneity-constrained random (HCR) resampling (Lengyel, Chytrý & Tichý, 2011) of community data. Within each stratum (e.g., grid cell), many random subsets of plots are evaluated and the subset with the highest mean dissimilarity and the lowest variance of dissimilarities is retained. Optionally, the number of plots per stratum is adapted from the stratum’s mean pairwise dissimilarity (\(\beta\)-diversity).

Usage

hcr_resampling(
  data_wide,
  transform = c("none", "sqrt", "log1p", "binary"),
  score_dist = "bray",
  beta_dist = c("bray", "jaccard"),
  adaptive_n = TRUE,
  n_plots = NA,
  min_plots = 10,
  max_plots = 100,
  min_stratum_n = 10,
  trials = 1000,
  group_vec = NULL,
  group_limits = NULL,
  write_csv = NULL,
  progress = interactive(),
  seed = NULL
)

Value

A data.frame with sample_id and selected (0/1). Attributes: selected_rows (logical) and params.

Arguments

data_wide

a data-frame like object with the following column contents:

column 1: sample ids
column 2: strata
columns 3...n: species.

transform

One of c("none","sqrt","log1p","binary"). If "binary", values become 0/1 and vegan::vegdist(binary = TRUE) is used.

score_dist

Dissimilarity method for trial scoring; any method accepted by vegan::vegdist (e.g., "bray","jaccard", "hellinger", "euclidean", "canberra", "gower", "kulczynski","morisita","horn","mountford","raup","binomial", "chao","cao", …).

beta_dist

One of c("bray","jaccard") for per-stratum mean dissimilarity used to calculate the adaptive number of plots. With transform="binary", "bray" equals Sørensen.

adaptive_n

Logical. If TRUE, adapt the number of plots per stratum from beta_mean * max_plots bounded to [min_plots, max_plots]; if FALSE, use fixed n_plots.

n_plots

Fixed number of plots per stratum when adaptive_n=FALSE. If NA, defaults to max_plots (capped at stratum size).

min_plots, max_plots

Global default min/max number of plots per stratum

min_stratum_n

Minimum stratum size under which the whole stratum is selected (no resampling).

trials

Number of random trials per stratum (default 1000).

group_vec

Optional vector (length nrow(data_wide)) assigning each sample to a higher-level group (e.g., country, region). Used only if adaptive_n=TRUE.

group_limits

Optional data.frame with group-specific limits. The first column must contain group names; it must also contain numeric columns named "min_plots" and "max_plots". Other columns are ignored.

write_csv

Optional file path to write a CSV with columns sample_id, selected. If NULL, no file.

progress

Show a text progress bar (default: interactive()).

seed

Optional integer seed for reproducibility of random subset trials.

Author

Friedemann von Lampe

Details

The algorithm follows Lengyel, Chytrý & Tichý (2011) and was based upon the JUICE implementation (Tichý, 2002). For speed, it precomputes per-stratum distance matrices (once) and reuses them across trials, which enables large numbers of trials (default trials = 1000).

Within each stratum candidate subsets are scored using score_dist by high mean dissimilarity and low variance of dissimilarities.

If adaptive_n = TRUE (default), the target number of plots is computed as a linear function of the mean pairwise dissimilarity (\(\beta\)-diversity; beta_dist) and the maximum number of plots (beta_mean * max_plots; Wiser & de Cáceres, 2013) and then bounded to [min_plots, max_plots] and the stratum size.

Additionally group-specific limits for minimum and maximum numbers of plots per stratum can be supplied via group_vec and group_limits. Each sample is assigned to a higher-level group (e.g., country or region), and the minimum and maximum number of plots are defined per group. This allows, for example, larger plot limits to be set for larger countries or regions.

References

Lengyel, A., Chytrý, M., & Tichý, L. (2011). Heterogeneity-constrained random resampling of phytosociological databases. Journal of Vegetation Science, 22(1), 175–183. tools:::Rd_expr_doi("10.1111/j.1654-1103.2010.01225.x")

Tichý, L. (2002). JUICE, software for vegetation classification. Journal of Vegetation Science, 13(3), 451. tools:::Rd_expr_doi("10.1658/1100-9233(2002)013[0451:JSFVC]2.0.CO;2")

Wiser, S. K., & de Cáceres, M. (2013). Updating vegetation classifications: an example with New Zealand's woody vegetation. Journal of Vegetation Science, 24(1), 80–93. tools:::Rd_expr_doi("10.1111/j.1654-1103.2012.01450.x")