closedpMS: Hierarchical Loglinear Model Selection in Closed Population Capture-Recapture Experiments

Description

The closedpMS.t function fits every possible hierarchical loglinear models for a given closed population capture-recapture data set, under the constraints set by the given maxorder and forced arguments. Parameters for heterogeneity in capture probabilites among units can be added to the models.

The getAllModels function lists every possible hierarchical loglinear models for a certain number of capture occasions t, under the constraints set by the given maxorder and forced arguments.

Usage

closedpMS.t(X, dfreq = FALSE, h = NULL, h.control = list(), 
            maxorder = t - 1, forced = 1:t, stopiflong = TRUE, …)
			
# S3 method for closedpMS
print(x, …)
# S3 method for closedpMS
plot(x, main="Models comparison based on BIC", omitOutliers = TRUE, …)
getAllModels(t, maxorder = t - 1, forced = 1:t, stopiflong = TRUE)

Arguments

The matrix of the observed capture histories (see Rcapture-package for a description of the accepted formats).

dfreq

A logical. By default FALSE, which means that X has one row per unit. If TRUE, it indicates that the matrix X contains frequencies in its last column.

A character string ("LB", "Chao", "Poisson", "Darroch", "Gamma" or "Normal") or a numerical R function specifying the form of the column(s) for heterogeneity in the design matrix. "LB" and "Chao" represents Chao's lower bound model (the default), "Poisson" represents the function $f(k)=theta^k-1$, "Darroch" represents the function $f(k)=k^2/2$, and "Gamma" represents the function $f(k)=-\log(theta + k) + \log(theta)$ where $k$ is the number of captures and $theta$ is a parameter specified with the h.control element theta. "Normal" represents heterogeneity modeled with a normal mixing distribution, as presented in Rivest (2011). If an R function is given, it must be the implementation of any convex mathematical function $f(k)$ (it must have only one argument).

h.control

A list of elements to control the heterogeneous part of the model, if any. For a Poisson or Gamma heterogeneous model:

theta: The value of the parameter $theta$ in $f(k)=theta^k-1$ for the Poisson model (the default value is 2) and in $f(k)=-\log(theta + k) + \log(theta)$ for the Gamma model (the default value is 3.5).

For a Chao's lower bound heterogeneous model:

neg: If this option is set to TRUE (the default), negative eta parameters in Chao's lower bound models are set to zero.

For a Normal heterogeneous model:

initsig: Initial value for the sigma parameter of the normal mixing distribution. The default is 0.2.
method: The method to be used by optim. The default is "BFGS".

maxorder

A numeric specifying the higher order accepted for the terms in the models. It can take a value between 1 and t - 1 (the default).

forced

A vector of the terms forced in the model (by default every first order terms). NULL if no terms are forced in the model.

stopiflong

A logical indicating whether the function execution should be stopped when the number of possibles models is larger than 10 000, that is when the function might be long to run (defaut TRUE).

…

Further arguments to be passed to glm, optim or print.default.

An object, produced by the closedpMS.t function, to print or to plot.

main

A main title for the plot.

omitOutliers

A logical. If TRUE (the default), models with an outlier abundance or BIC value are removed from the plot. A value is considered an outlier if it is smaller than the first quartile minus 1.5 times the interquartile range, or larger than the third quartile plus 1.5 times the interquartile range.

A numeric specifying the number of capture occasions in the experiments. It is deduced from the data set in the closedpMS.t function. It must be an integer between 1 and 9 inclusively.

Value

getAllModels returns a caracter vector with the models names.

closedpMS.t returns a list with the following elements:

The number of captured units

The total number of capture occasions in the data matrix X. When captures occur in continuous time (input argument t=Inf), this output value is the larger number of captures observed.

results

A table containing, for every fitted model:

abundance: the estimated population size,
stderr: the standard error of the estimated population size,
deviance: the model's deviance,
df: the number of degrees of freedom,
AIC: the Akaike's information criterion,
BIC: the bayesian information criterion,
infoFit: a numerical code giving information about error or warnings encountered when fitting the model (see Rcapture-package for details).

bias

A vector, the asymptotic bias of the estimated population size for every fitted model.

fit.err

A list of character string vectors. If an error occurs while fitting a model (with glm or optim), a copy of the error message is stored in fit.err$mname where mname is the name of the hierarchical loglinear model. A NULL list element means that no error occured for the considered model.

fit.warn

A list of character string vectors. If warnings are generated while fitting a model (with glm or optim), a copy of these warnings are stored in fit.warn$mname where mname is a the name of the hierarchical model. A NULL list element means that no warnings were generated for the considered model.

neg.eta

For Chao's lower bound models only: the position of the eta parameters set to zero in the loglinear parameter vector, if any.

Details

HIERARCHICAL LOGLINEAR MODEL NAME SYNTAX

First, a model's term is written using numbers between 1 and 9 to represent the capture occasions it includes (ex.: 134 represents the three-way interaction c1:c3:c4). This syntax limits the maximal number of capture occation to 9. This is not a problem since from 6 capture occasions upwards, the number of hierarchical models becomes very large and difficult to manage.

A hierarchical model name is a list of the model's terms at the top of the hierarchies in the model. These terms are separated by commas, without spaces. They are surronded by brackets. For example, "[123,34,5]" is the name of the model ~ 1 + c1 + c2 + c3 + c4 + c5 + c1:c2 + c1:c3 + c2:c3 + c3:c4 + c1:c2:c3.

References

Baillargeon, S. and Rivest, L.P. (2007) Rcapture: Loglinear models for capture-recapture in R. Journal of Statistical Software, 19(5), http://www.jstatsoft.org/v19/i05.

Chao, A. (1987) Estimating the population size for capture-recapture data with unequal catchability. Biometrics, 43(4), 783--791.

Cormack, R. M. (1992) Interval estimation for mark-recapture studies of closed populations. Biometrics, 48, 567--576.

Rivest, L.P. (2011) A lower bound model for multiple record systems estimation with heterogeneous catchability. The International Journal of Biostatistics, 7(1), Article 23.

Rivest, L.P. and Baillargeon, S. (2007) Applications and extensions of Chao's moment estimator for the size of a closed population. Biometrics, 63(4), 999--1006.

Examples

Run this code

# NOT RUN {
# The lesbian data set contains 4 capture occasions.
# By default, closedpMB.t fits the 113 following models:

getAllModels(4)

closedpMS.t(lesbian, dfreq = TRUE)

# We could reduce the number of models by omitting
# those with triple interactions.

closedpMS.t(lesbian, dfreq = TRUE, maxorder = 2)

# Models with heterogeneity fits better.

Darr <- closedpMS.t(lesbian, dfreq = TRUE, h = "Darroch")
Darr

# The plot method allows the visualization of the results
# from models fitted by closedpMS.t().

plot(Darr)

# According to the BIC, the best heterogeneous Darroch model
# for this data set contains the double interactions 12, 13, 14.
# Here is the profile likelihood confidence interval for the
# abundance estimation from this model.

closedpCI.t(lesbian, dfreq = TRUE, mX = "[12,13,14]", h = "Darroch")
# }

Run the code above in your browser using DataLab