Rcapture-package: Loglinear Models for Capture-Recapture Experiments

Description

Estimation of abundance and of other demographic parameters for closed populations, open populations and the robust design in capture-recapture experiments using loglinear models.

This package focuses on closed populations. Since version 1.2-0, no new features have been added to open populations and robust design functions.

Arguments

Details

Package:	Rcapture
Type:	Package
Version:	1.4-3
Date:	2019-12-16
License:	GPL-2

SUMMARY OF Rcapture CONTENTS

The Rcapture package contains nine capture-recapture data sets and the following functions:

Model fitting functions for > closed populations:
- closedp functions: fit various loglinear models for abundance estimation,
- closedpCI functions: fit one customized loglinear model and calculate a confidance interval for the abundance estimation,
- closedpMS.t: fits various hierarchical loglinear models in a perspective of model selection,
- closedp.bc: performs bias corrections to the abundance estimations from customized loglinear models,
- closedp.Mtb: fits model Mtb, which cannot be fitted by any other function, for abundance estimation;
> open populations:
- openp: computes various demographic parameters using a loglinear model;
> robust design:
- robustd functions: compute various demographic parameters and capture probabilities per period using a loglinear model.
Descriptive statistics functions:
- descriptive: produces basic descriptive statistics for capture-recapture data;
- uifit: produces fit statistics concerning the \(u_i\), i.e. the numbers of first captures on each capture occasion, for closed population models.
Data manipulation functions:
- histpos functions: builds a matrix of observable capture histories;
- periodhist: merges capture occasions.

DESCRIPTION OF DATA SET FORMATS

In capture-recapture experiments, the data collected consist of capture histories for captured units. A capture history is simply a serie of capture indicators for each capture event in the experiment. The capture history of one unit is expressed as a length \(t\) vector \(w = (w_1, \ldots, w_t)\), where \(w_j = 1\) if the unit is captured at the jth occasion and 0 if not. For closed populations, capture events are named capture occasions, whereas they are named capture periods for open populations.

Capture-recapture data sets are given to Rcapture functions through the X argument. X must be a numeric matrix. Arguments dfreq and dtype indicate the format of the matrix. Each have two possible values, meaning that four data set formats are possible with Rcapture.

FORMAT 1 - CAPTURE HISTORY PER UNIT If dfreq=FALSE and dtype="hist" (the default), X has one row per unit captured in the experiment. Each row is an observed capture history. It must contain only zeros and ones; the number one indicates a capture. In this case, the number of columns in the table represents the number of capture occasions in the experiment (noted \(t\)). Here is an example of a data set of this type for \(t=2\): 1 1 1 1 1 0 1 0 1 0 1 0 0 1

FORMAT 2 - AGGREGATED CAPTURE HISTORIES If dfreq=TRUE and dtype="hist", X contains one row per observed capture history followed by its frequency. In that case, X has \(t\)+1 columns. As for format 1, the first \(t\) columns of X, identifying the capture histories, must contain only zeros and ones. The number one indicates a capture. In this format, the example data set is represented by the following matrix: 1 1 2 1 0 4 0 1 1 If a possible capture history is not observed, it can appear in X with a frequency of zero, or it can simply be omitted.

FORMAT 3 - NUMBER OF CAPTURES PER UNIT If dfreq=FALSE and dtype="nbcap", X is a vector with the number of captures for every captured unit. Therefore, this format does not contain complete capture histories. Instead, capture histories are summarized through the number of captures. In this format, the example data set looks like: 2 2 1 1 1 1 1

FORMAT 4 - AGGREGATED NUMBERS OF CAPTURES If dfreq=TRUE and dtype="nbcap", X is a 2 columns matrix. The first column contains the observed numbers of captures, the second columns contains their frequencies. In this format, the example data is: 2 2 1 5

DETAILS ABOUT FORMATS WITH NUMBERS OF CAPTURES

Only few functions have the dtype argument. Functions without dtype argument accept only a data matrix X of the form dtype="hist". So the first two formats listed above are the most common.

Formats with dtype="nbcap" are used for captures in continuous time (see below). They are also useful to reduce the size of the data set for experiments with a large number of capture occasions \(t\) (often with no units caught a large number of times). For theses formats, the number of capture occasions \(t\) cannot be deduced from X as it can be with dtype="hist". One has no garanties that the larger number of captures observed is the total number of capture occasions. Therefore, if one gives a data matrix X with dtype="nbcap", one must also provide t, the number of capture occasions, as an additional argument.

For now, the data formats with dtype="nbcap" are not generalized to the robust design. So dtype is not an argument of the robustd.0 function.

CAPTURES IN CONTINUOUS TIME

In some capture-recapture experiments, there is no well defined capture occasions. Captures occur in continuous time. The data set ill comes from such an experiment. Bohning and Schon (2005) call this type of capture-recapture data repeated counting data. These data sets always have the format dtype="nbcap".

We can estimate abundance for data of this type using the option t=Inf with the functions closedpCI.0 and closedpCI.0. The function descriptive also accepts t=Inf. It modifies the y coordinate of the exploratory heterogeneity graph.

DISTINCTION BETWEEN .t and .0 FUNCTIONS

Capture recapture models for closed population aim at estimating the population size by modelling the probabilities of the different capture histories. The data available to fit these models consist of observed frequencies of capture histories. These frequencies are modeled in Rcapture using loglinear models for frequency tables. For functions with a name ending with .t (closedp.t, closedpCI.t, closedpMS.t and robustd.t), the observed values of the response variable is the vector of frequencies for every observable capture history. It has length \(2^t-1\).

However, for a model without temporal effect (assuming that the probability of capturing a unit does not vary between capture occasions), all the information needed to fit the model can be found in aggregated data. Functions with a name ending with .0 (closedp.0, closedpCI.0 and robustd.0) fit models using as response variable the number of units captured \(i\) times, for \(i=1,\ldots,t\). It is a vector of length \(t\), which is much shorter than \(2^t-1\) for a large \(t\). Because of an appropriate offset added to the model, the results from a .0 function match exaclty the results from a .t function for a corresponding capture-recapture model. Because .0 functions deal with smaller design matrix, they run faster than .t functions, but they cannot fit models with a temporal effect.

FUNCTIONS USED FOR MODEL FITTING

Most of the Rcapture functions use the function glm of the stats package to fit a loglinear model. However, the function optim, again of the stats, is also used by three functions:

closedpCI.t, closedpCI.0 and closedpMS.t: when a normal heterogeneous models is requested,
closedp.Mtb: because model Mtb does not have a loglinear form.

ERRORS AND WARNINGS MANAGEMENT IN CLOSED POPULATION FUNCTIONS

If an error occurs while executing a function fitting only one closed population model (closedpCI functions, closedp.bc, closedp.Mtb), the execution is stopped and the error message is printed (usual behavior in R). However, if an error occurs while fitting a model in a call to a closedp population function fitting more than one model (closedp functions, closedpMS.t), the execution of the call is not stopped. Instead, the row in the results table for the problematic model is filled with NA and the error message is stored in an output value. This value is called glm.err for closedp functions. It is however called fit.err for closedpMS.t because this function do not always use glm to fit the model (as mentionned above, optim is used for normal heterogeneous models).

Warning messages while fitting a closed population model, if any, are stored in an output value called 'glm.warn', 'fit.warn' or 'optim.warn' depending on the function. They are not printed in the console. To inform the user that a warning occured, the last column of the results table, named infoFit, contains a numerical code giving information about errors or warnings encountered. Here is a description of the meaning of the numerical code:

0: no error or warning occured while fitting the model;
-1: an error occured while fitting the model;
1: a warning indicating that the model fit is questionnable occured (algorithm did not converge, non-positive sigma estimate for a normal heterogeneous model or large asymptotic bias);
2: the warning 'design matrix not of full rank' occured, therefore some model's coefficients are not estimable;
3: a warning not of type 1 or 2 occured (the glm warning 'fitted rates nummerically 0 occured' is often encountered with small frequencies, it does not always mean that the model fit is questionnable).

The elements in the column infoFit can contain more than one number since more than one warning can occur. For exemple, if infoFit takes the value 13 for a model, it means that at least one warning of type 1 and one warning of type 3 have occured. If more than one warning of the same type are encountered, the number representing the type of warning is not repeated in infoFit.

References

Baillargeon, S. and Rivest, L.P. (2007) Rcapture: Loglinear models for capture-recapture in R. Journal of Statistical Software, 19(5), http://www.jstatsoft.org/v19/i05.

Bohning, D. and Schon, D. (2005) Nonparametric Maximum Likelihood Estimation of Population Size Based on the Counting Distribution. Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(4), 721-737.

Chao, A. (1987) Estimating the population size for capture-recapture data with unequal catchability. Biometrics, 43(4), 783--791.

Cormack, R. M. (1985) Example of the use of glim to analyze capture-recapture studies. In Lecture Notes in Statistics 29: Statistics in Ornithology, Morgan, B. et North, P. editors, New York,: Springer-Verlag, 242--274.

Cormack, R. M. (1989) Loglinear models for capture-recapture. Biometrics, 45, 395--413.

Cormack, R. M. (1992) Interval estimation for mark-recapture studies of closed populations. Biometrics, 48, 567--576.

Cormack, R. M. (1993) Variances of mark-recapture estimates. Biometrics, 49, 1188--1193.

Cormack, R. M. and Jupp, P. E. (1991) Inference for Poisson and multinomial models for capture-recapture experiments. Biometrika, 78(4), 911--916.

Rivest, L.P. and Levesque, T. (2001) Improved loglinear model estimators of abundance in capture-recapture experiments. Canadian Journal of Statistics, 29, 555--572.

Rivest, L.P. and Daigle, G. (2004) Loglinear models for the robust design in mark-recapture experiments. Biometrics, 60, 100--107.

Rivest, L.P. and Baillargeon, S. (2007) Applications and extensions of Chao's moment estimator for the size of a closed population. Biometrics, 63(4), 999--1006.

Rivest, L.P. (2008) Why a time effect often has a limited impact on capture-recapture estimates in closed populations. Canadian Journal of Statistics, 36(1), 75--84.

Examples

Run this code

# NOT RUN {
# Here is an example on the lesbian data set.

desc <- descriptive(lesbian, dfreq = TRUE)
desc
plot(desc)

# 1612 out of 2185 individuals (74%) appear on one list only.
# The exploratory heterogeneity graph are not quite linear.
# Some heterogeneity in the units capture probabilities
# seem present in the data set.

closedp(lesbian, dfreq = TRUE)

# According to the BIC, the best model is Mth Darroch.
# Let's see if adding  interactions between capture 
# histories to the model could improve the model's fit.

closedpMS.t(lesbian, dfreq = TRUE, h = "Darroch")

# According to the BIC, the best heterogeneous Darroch model
# contains the double interactions 12, 13, 14. 
# Here is the profile likelihood confidence interval for the
# abundance estimation from this model.

closedpCI.t(lesbian, dfreq = TRUE, mX = "[12,13,14]", h = "Darroch")


####################################################

# Example to illustrate warnings management in closed population functions.

# Here is a capture-recapture data set one could encounter.

crdata <- cbind(histpos.t(4), c(0,0,3,0,0,0,0,0,0,0,0,1,0,0,2))

# This data set contains 4 capture occasions but only 6 captured units.
# Fitting capture-recapture models on this data set is quite useless.
# The population size should be very close to the sample size.

# Such small frequencies in a capture-recapture data set should
# lead to warnings when fitting a loglinear model on it.

ex <- closedp.t(crdata, dfreq=TRUE)
ex

# Many models produce warnings of type 1 indicating that the model fit 
# is questionnable. The very large abundance estimation for some models 
# are another indicator of questionable model fits.
# Details about the warnings are found in the glm.warn element of the output.

ex$glm.warn
# }

Run the code above in your browser using DataLab