bootdht: Bootstrap uncertainty estimation for distance sampling models

Description

Performs a bootstrap for simple distance sampling models using the same data structures as dht. Note that only geographical stratification as supported in dht is allowed.

Usage

bootdht(
  model,
  flatfile,
  resample_strata = FALSE,
  resample_obs = FALSE,
  resample_transects = TRUE,
  nboot = 100,
  summary_fun = bootdht_Nhat_summarize,
  convert_units = 1,
  select_adjustments = FALSE,
  sample_fraction = 1,
  multipliers = NULL,
  progress_bar = "base",
  cores = 1,
  convert.units = NULL
)

Arguments

model: a model fitted by ds or a list of models
flatfile: Data provided in the flatfile format. See flatfile for details. Please note, it is a current limitation of bootdht that all Sample.Label identifiers must be unique across all strata, i.e.transect ids must not be re-used from one strata to another. An easy way to achieve this is to paste together the stratum names and transect ids.
resample_strata: should resampling happen at the stratum (Region.Label) level? (Default FALSE)
resample_obs: should resampling happen at the observation (object) level? (Default FALSE)
resample_transects: should resampling happen at the transect (Sample.Label) level? (Default TRUE)
nboot: number of bootstrap replicates
summary_fun: function that is used to obtain summary statistics from the bootstrap, see Summary Functions below. By default bootdht_Nhat_summarize is used, which just extracts abundance estimates.
convert_units: conversion between units for abundance estimation, see "Units", below. (Defaults to 1, implying all of the units are "correct" already.) This takes precedence over any unit conversion stored in model.
select_adjustments: select the number of adjustments in each bootstrap, when FALSE the exact detection function specified in model is fitted to each replicate. Setting this option to TRUE can significantly increase the runtime for the bootstrap. Note that for this to work model must have been fitted with adjustment!=NULL.
sample_fraction: what proportion of the transects was covered (e.g., 0.5 for one-sided line transects).
multipliers: list of multipliers. See "Multipliers" below.
progress_bar: which progress bar should be used? Default "base" uses txtProgressBar, "none" suppresses output, "progress" uses the progress package, if installed.
cores: number of CPU cores to use to compute the estimates. See "Parallelization" below.
convert.units: deprecated, see same argument with underscore, above.

Summary Functions

The function summary_fun allows the user to specify what summary statistics should be recorded from each bootstrap. The function should take two arguments, ests and fit. The former is the output from dht2, giving tables of estimates. The latter is the fitted detection function object. The function is called once fitting and estimation has been performed and should return a data.frame. Those data.frames are then concatenated using rbind. One can make these functions return any information within those objects, for example abundance or density estimates or the AIC for each model. See Examples below.

Multipliers

It is often the case that we cannot measure distances to individuals or groups directly, but instead need to estimate distances to something they produce (e.g., for whales, their blows; for elephants their dung) -- this is referred to as indirect sampling. We may need to use estimates of production rate and decay rate for these estimates (in the case of dung or nests) or just production rates (in the case of songbird calls or whale blows). We refer to these conversions between "number of cues" and "number of animals" as "multipliers".

The multipliers argument is a list, with 3 possible elements (creation and decay). Each element of which is either:

data.frame and must have at least a column named rate, which abundance estimates will be divided by (the term "multiplier" is a misnomer, but kept for compatibility with Distance for Windows). Additional columns can be added to give the standard error and degrees of freedom for the rate if known as SE and df, respectively. You can use a multirow data.frame to have different rates for different geographical areas (for example). In this case the rows need to have a column (or columns) to merge with the data (for example Region.Label).
a function which will return a single estimate of the relevant multiplier. See make_activity_fn for a helper function for use with the activity package.

Model selection

Model selection can be performed on a per-replicate basis within the bootstrap. This has three variations:

when select_adjustments is TRUE then adjustment terms are selected by AIC within each bootstrap replicate (provided that model had the order and adjustment options set to non-NULL.
if model is a list of fitted detection functions, each of these is fitted to each replicate and results generated from the one with the lowest AIC.
when select_adjustments is TRUE and model is a list of fitted detection functions, each model fitted to each replicate and number of adjustments is selected via AIC. This last option can be extremely time consuming.

Parallelization

If cores>1 then the parallel/doParallel/foreach/doRNG packages will be used to run the computation over multiple cores of the computer. To use this component you need to install those packages using: install.packages(c("foreach", "doParallel", "doRNG")) It is advised that you do not set cores to be greater than one less than the number of cores on your machine. The doRNG package is required to make analyses reproducible (set.seed can be used to ensure the same answers).

It is also hard to debug any issues in summary_fun so it is best to run a small number of bootstraps first in parallel to check that things work. On Windows systems summary_fun does not have access to the global environment when running in parallel, so all computations must be made using only its ests and fit arguments (i.e., you can not use R objects from elsewhere in that function, even if they are available to you from the console).

Another consequence of the global environment being unavailable inside parallel bootstraps is that any starting values in the model object passed in to bootdht must be hard coded (otherwise you get back 0 successful bootstraps). For a worked example showing this, see the camera trap distance sampling online example at https://examples.distancesampling.org/Distance-cameratraps/camera-distill.html.

Examples

Run this code

if (FALSE) {
# fit a model to the minke data
data(minke)
mod1 <- ds(minke)

# summary function to save the abundance estimate
Nhat_summarize <- function(ests, fit) {
  return(data.frame(Nhat=ests$individuals$N$Estimate))
}

# perform 5 bootstraps
bootout <- bootdht(mod1, flatfile=minke, summary_fun=Nhat_summarize, nboot=5)

# obtain basic summary information
summary(bootout)
}

Run the code above in your browser using DataLab