Estimation of abundance and of other demographic parameters for closed populations, open populations and the robust design in capture-recapture experiments using loglinear models.

This package focuses on closed populations. Since version 1.2-0, no new features have been added to open populations and robust design functions.

Package: | Rcapture |

Type: | Package |

Version: | 1.4-4 |

Date: | 2022-05-03 |

License: | GPL-2 |

SUMMARY OF Rcapture CONTENTS

The Rcapture package contains nine capture-recapture data sets and the following functions:

Model fitting functions for > closed populations:

`closedp`

functions: fit various loglinear models for abundance estimation,`closedpCI`

functions: fit one customized loglinear model and calculate a confidance interval for the abundance estimation,`closedpMS.t`

: fits various hierarchical loglinear models in a perspective of model selection,`closedp.bc`

: performs bias corrections to the abundance estimations from customized loglinear models,`closedp.Mtb`

: fits model Mtb, which cannot be fitted by any other function, for abundance estimation;

> open populations:

`openp`

: computes various demographic parameters using a loglinear model;

> robust design:

`robustd`

functions: compute various demographic parameters and capture probabilities per period using a loglinear model.

Descriptive statistics functions:

`descriptive`

: produces basic descriptive statistics for capture-recapture data;`uifit`

: produces fit statistics concerning the \(u_i\), i.e. the numbers of first captures on each capture occasion, for closed population models.

Data manipulation functions:

`histpos`

functions: builds a matrix of observable capture histories;`periodhist`

: merges capture occasions.

DESCRIPTION OF DATA SET FORMATS

In capture-recapture experiments, the data collected consist of capture histories for captured units. A capture history is simply a serie of capture indicators for each capture event in the experiment. The capture history of one unit is expressed as a length \(t\) vector \(w = (w_1, \ldots, w_t)\), where \(w_j = 1\) if the unit is captured at the jth occasion and 0 if not. For closed populations, capture events are named capture occasions, whereas they are named capture periods for open populations.

Capture-recapture data sets are given to Rcapture functions through the `X`

argument. `X`

must be a numeric matrix. Arguments `dfreq`

and `dtype`

indicate the format of the matrix. Each have two possible values, meaning that four data set formats are possible with Rcapture.

FORMAT 1 - CAPTURE HISTORY PER UNIT
If `dfreq=FALSE`

and `dtype="hist"`

(the default), `X`

has one row per unit captured in the experiment. Each row is an observed capture history. It must contain only zeros and ones; the number one indicates a capture. In this case, the number of columns in the table represents the number of capture occasions in the experiment (noted \(t\)). Here is an example of a data set of this type for \(t=2\):
1 1
1 1
1 0
1 0
1 0
1 0
0 1

FORMAT 2 - AGGREGATED CAPTURE HISTORIES
If `dfreq=TRUE`

and `dtype="hist"`

, `X`

contains one row per observed capture history followed by its frequency. In that case, `X`

has \(t\)+1 columns. As for format 1, the first \(t\) columns of `X`

, identifying the capture histories, must contain only zeros and ones. The number one indicates a capture. In this format,
the example data set is represented by the following matrix:
1 1 2
1 0 4
0 1 1
If a possible capture history is not observed, it can appear in `X`

with a frequency of zero, or it can simply be omitted.

FORMAT 3 - NUMBER OF CAPTURES PER UNIT
If `dfreq=FALSE`

and `dtype="nbcap"`

, `X`

is a vector with the number of captures for every captured unit. Therefore, this format does not contain complete capture histories. Instead, capture histories are summarized through the number of captures. In this format, the example data set looks like:
2 2 1 1 1 1 1

FORMAT 4 - AGGREGATED NUMBERS OF CAPTURES
If `dfreq=TRUE`

and `dtype="nbcap"`

, `X`

is a 2 columns matrix. The first column contains the observed numbers of captures, the second columns contains their frequencies.
In this format, the example data is:
2 2
1 5

DETAILS ABOUT FORMATS WITH NUMBERS OF CAPTURES

Only few functions have the `dtype`

argument. Functions without `dtype`

argument accept only a data matrix `X`

of the form `dtype="hist"`

. So the first two formats listed above are the most common.

Formats with `dtype="nbcap"`

are used for captures in continuous time (see below). They are also useful to reduce the size of the data set for experiments with a large number of capture occasions \(t\) (often with no units caught a large number of times). For theses formats, the number of capture occasions \(t\) cannot be deduced from `X`

as it can be with `dtype="hist"`

. One has no garanties that the larger number of captures observed is the total number of capture occasions. Therefore, if one gives a data matrix `X`

with `dtype="nbcap"`

, one must also provide `t`

, the number of capture occasions, as an additional argument.

For now, the data formats with `dtype="nbcap"`

are not generalized to the robust design. So `dtype`

is not an argument of the `robustd.0`

function.

CAPTURES IN CONTINUOUS TIME

In some capture-recapture experiments, there is no well defined capture occasions.
Captures occur in continuous time. The data set `ill`

comes from
such an experiment. Bohning and Schon (2005) call this type of capture-recapture
data *repeated counting data*. These data sets always have the format `dtype="nbcap"`

.

We can estimate abundance for data of this type using the option `t=Inf`

with the
functions `closedpCI.0`

and `closedpCI.0`

. The function
`descriptive`

also accepts `t=Inf`

. It modifies the y coordinate
of the exploratory heterogeneity graph.

DISTINCTION BETWEEN `.t`

and `.0`

FUNCTIONS

Capture recapture models for closed population aim at estimating the population size by modelling the probabilities of the different capture histories. The data available to fit these models consist of observed frequencies of capture histories. These frequencies are modeled in Rcapture using loglinear models for frequency tables. For functions with a name ending with `.t`

(`closedp.t`

, `closedpCI.t`

, `closedpMS.t`

and `robustd.t`

), the observed values of the response variable is the vector of frequencies for every observable capture history. It has length \(2^t-1\).

However, for a model without temporal effect (assuming that the probability of capturing a unit does not vary between capture occasions), all the information needed to fit the model can be found in aggregated data. Functions with a name ending with `.0`

(`closedp.0`

, `closedpCI.0`

and `robustd.0`

) fit models using as response variable the number of units captured \(i\) times, for \(i=1,\ldots,t\). It is a vector of length \(t\), which is much shorter than \(2^t-1\) for a large \(t\). Because of an appropriate offset added to the model, the results from a `.0`

function match exaclty the results from a `.t`

function for a corresponding capture-recapture model. Because `.0`

functions deal with smaller design matrix, they run faster than `.t`

functions, but they cannot fit models with a temporal effect.

FUNCTIONS USED FOR MODEL FITTING

Most of the Rcapture functions use the function `glm`

of the stats package to fit a loglinear model. However, the function `optim`

, again of the stats, is also used by three functions:

`closedpCI.t`

,`closedpCI.0`

and`closedpMS.t`

: when a normal heterogeneous models is requested,`closedp.Mtb`

: because model Mtb does not have a loglinear form.

ERRORS AND WARNINGS MANAGEMENT IN CLOSED POPULATION FUNCTIONS

If an error occurs while executing a function fitting only one closed population model (`closedpCI`

functions, `closedp.bc`

, `closedp.Mtb`

), the execution is stopped and the error message is printed (usual behavior in R). However, if an error occurs while fitting a model in a call to a closedp population function fitting more than one model (`closedp`

functions, `closedpMS.t`

), the execution of the call is not stopped. Instead, the row in the results table for the problematic model is filled with `NA`

and the error message is stored in an output value. This value is called `glm.err`

for `closedp`

functions. It is however called `fit.err`

for `closedpMS.t`

because this function do not always use `glm`

to fit the model (as mentionned above, `optim`

is used for normal heterogeneous models).

Warning messages while fitting a closed population model, if any, are stored in an output value called 'glm.warn', 'fit.warn' or 'optim.warn' depending on the function. They are not printed in the console. To inform the user that a warning occured, the last column of the results table, named `infoFit`

, contains a numerical code giving information about errors or warnings encountered. Here is a description of the meaning of the numerical code:

- 0
no error or warning occured while fitting the model;

- -1
an error occured while fitting the model;

- 1
a warning indicating that the model fit is questionnable occured (algorithm did not converge, non-positive sigma estimate for a normal heterogeneous model or large asymptotic bias);

- 2
the warning 'design matrix not of full rank' occured, therefore some model's coefficients are not estimable;

- 3
a warning not of type 1 or 2 occured (the

`glm`

warning 'fitted rates nummerically 0 occured' is often encountered with small frequencies, it does not always mean that the model fit is questionnable).

The elements in the column `infoFit`

can contain more than one number since more than one warning can occur. For exemple, if `infoFit`

takes the value `13`

for a model, it means that at least one warning of type 1 and one warning of type 3 have occured. If more than one warning of the same type are encountered, the number representing the type of warning is not repeated in `infoFit`

.

Baillargeon, S. and Rivest, L.P. (2007) Rcapture: Loglinear models for capture-recapture in R. *Journal of Statistical Software*, **19**(5), 10.18637/jss.v019.i05.

Bohning, D. and Schon, D. (2005) Nonparametric Maximum Likelihood Estimation of Population Size Based on the Counting Distribution. *Journal of the Royal Statistical Society: Series C (Applied Statistics)*, **54**(4), 721-737.

Chao, A. (1987) Estimating the population size for capture-recapture data with unequal catchability. *Biometrics*, **43**(4), 783--791.

Cormack, R. M. (1985) Example of the use of glim to analyze capture-recapture studies. In *Lecture Notes in Statistics 29: Statistics in Ornithology*, Morgan, B. et North, P. editors, New York,: Springer-Verlag, 242--274.

Cormack, R. M. (1989) Loglinear models for capture-recapture. *Biometrics*, **45**, 395--413.

Cormack, R. M. (1992) Interval estimation for mark-recapture studies of closed populations. *Biometrics*, **48**, 567--576.

Cormack, R. M. (1993) Variances of mark-recapture estimates. *Biometrics*, **49**, 1188--1193.

Cormack, R. M. and Jupp, P. E. (1991) Inference for Poisson and multinomial models for capture-recapture experiments. *Biometrika*, **78**(4), 911--916.

Rivest, L.P. and Levesque, T. (2001) Improved loglinear model estimators of abundance in capture-recapture experiments. *Canadian Journal of Statistics*, **29**, 555--572.

Rivest, L.P. and Daigle, G. (2004) Loglinear models for the robust design in mark-recapture experiments. *Biometrics*, **60**, 100--107.

Rivest, L.P. and Baillargeon, S. (2007) Applications and extensions of Chao's moment estimator for the size of a closed population. *Biometrics*, **63**(4), 999--1006.

Rivest, L.P. (2008) Why a time effect often has a limited impact on capture-recapture estimates in closed populations. *Canadian Journal of Statistics*, **36**(1), 75--84.

```
# NOT RUN {
# Here is an example on the lesbian data set.
desc <- descriptive(lesbian, dfreq = TRUE)
desc
plot(desc)
# 1612 out of 2185 individuals (74%) appear on one list only.
# The exploratory heterogeneity graph are not quite linear.
# Some heterogeneity in the units capture probabilities
# seem present in the data set.
closedp(lesbian, dfreq = TRUE)
# According to the BIC, the best model is Mth Darroch.
# Let's see if adding interactions between capture
# histories to the model could improve the model's fit.
closedpMS.t(lesbian, dfreq = TRUE, h = "Darroch")
# According to the BIC, the best heterogeneous Darroch model
# contains the double interactions 12, 13, 14.
# Here is the profile likelihood confidence interval for the
# abundance estimation from this model.
closedpCI.t(lesbian, dfreq = TRUE, mX = "[12,13,14]", h = "Darroch")
####################################################
# Example to illustrate warnings management in closed population functions.
# Here is a capture-recapture data set one could encounter.
crdata <- cbind(histpos.t(4), c(0,0,3,0,0,0,0,0,0,0,0,1,0,0,2))
# This data set contains 4 capture occasions but only 6 captured units.
# Fitting capture-recapture models on this data set is quite useless.
# The population size should be very close to the sample size.
# Such small frequencies in a capture-recapture data set should
# lead to warnings when fitting a loglinear model on it.
ex <- closedp.t(crdata, dfreq=TRUE)
ex
# Many models produce warnings of type 1 indicating that the model fit
# is questionnable. The very large abundance estimation for some models
# are another indicator of questionable model fits.
# Details about the warnings are found in the glm.warn element of the output.
ex$glm.warn
# }
```

Run the code above in your browser using DataCamp Workspace