Learn R Programming

auk (version 0.4.0)

filter_repeat_visits: Filter observations to repeat visits for hierarchical modeling

Description

Hierarchical modeling of abundance and occurrence requires repeat visits to sites to estimate detectability. These visits should be all be within a period of closure, i.e. when the population can be assumed to be closed. eBird data, and many other data sources, do not explicitly follow this protocol; however, subsets of the data can be extracted to produce data suitable for hierarchical modeling. This function extracts a subset of observation data that have a desired number of repeat visits within a period of closure.

Usage

filter_repeat_visits(x, min_obs = 2L, max_obs = 10L, n_days = 14L,
  annual_closure = FALSE, date_var = "observation_date",
  site_vars = c("locality_id", "observer_id"))

Arguments

x

data.frame; observation data, e.g. data from the eBird Basic Dataset (EBD) zero-filled with auk_zerofill(). This function will also work with an auk_zerofill object, in which case it will be converted to a data frame with collapse_zerofill(). Note that these data must for a single species.

min_obs

integer; minimum number of observations required for each site.

max_obs

integer; maximum number of observations allowed for each site.

n_days

integer; number of days defining the temporal length of closure. Ignored if annual_closure = TRUE.

annual_closure

logical; whether the entire year should be treated as the period of closure. This can be useful, for example, if data are from one season (e.g. breeding) across multiple years.

date_var

character; column name of the variable in x containing the date. This column should either be in Date format or convertible to Date format with as.Date().

site_vars

character; names of one of more columns in x that define a site, typically the location and observer IDs.

Value

A data.frame filtered to only retain observations from sites with the allowed number of observations within the period of closure. The results will be sorted such that sites are together and in chronological order. The following variables are added to the data frame:

  • site: a unique identifier for each "site" corresponding to all the variables in site_vars and closure_id concatenated together with underscore separators.

  • closure_id: a unique ID for each closure period. If annual_closure = TRUE, this will be the year. Otherwise, it will be the number of blocks of n_days days since the earliest observation. Note that in this latter case, there may be gaps in the IDs.

  • n_observations: number of observations at each site after all filtering.

Details

In addition to specifying the minimum and maximum number of observations per site, users must specify the variables in the dataset that define a "site". This is typically a combination of IDs defining the geographic site and the unique observer (repeat visits are meant to be conducted by the same observer). Finally, the number of days defining the period of closure is required. A default value of 14 days is used; however, users should choose a suitable period for their species within which the population can reasonably be assumed to be closed.

See Also

Other modeling: format_unmarked_occu

Examples

Run this code
# NOT RUN {
# read and zero-fill the ebd data
f_ebd <- system.file("extdata/zerofill-ex_ebd.txt", package = "auk")
f_smpl <- system.file("extdata/zerofill-ex_sampling.txt", package = "auk")
# data must be for a single species
ebd_zf <- auk_zerofill(x = f_ebd, sampling_events = f_smpl,
                       species = "Collared Kingfisher",
                       collapse = TRUE)
filter_repeat_visits(ebd_zf, n_days = 30)
# }

Run the code above in your browser using DataLab