F.cjs.estim: F.cjs.estim - Cormack-Jolly-Seber estimation

Description

Estimates Cormack-Jolly-Seber (CJS) capture-recapture models with individual, time, and individual-time varying covariates using the "regression" parametrization of Amstrup et al (2005, Ch 9). For live recaptures only. Losses on capture allowed. Uses a logistic link function to relate probability of capture and survival to external covariates.

Usage

F.cjs.estim(capture, survival, histories, cap.init, sur.init, group,  nhat.v.meth = 1, 
c.hat = -1, df = NA, intervals=rep(1,ncol(histories)-1), conf=0.95, 
link="logit", control=mra.control())

Arguments

capture

Formula specifying the capture probability model. Must be a formula object with no response. I.e., "~" followed by the names of 2-D arrays of covariates to fit in the capture model. For example: 'capture = ~ age + sex', where age and sex are matrices of size NAN X NS containing the age and sex covariate values. NAN = number of animals = number of rows in histories matrix (see below). NS = number of samples = number of columns in histories matrix (see below). Number of matrices specified in the capture model is assumed to be NX. Time varying and individual varying vectors are fitted using ivar() and tvar() (see Details). Factors are allowed within ivar() and tvar().

survival

Formula specifying the survival probability model. Must be a formula object with no response. I.e., "~" followed by the names of 2-D arrays of covariates to fit in the survival model. For example: 'survival = ~ year + ageclass' where year and ageclass are matrices of size NAN X NS containing year and ageclass covariate values. Number of matrices specified in the survival model is assumed to be NY. Time varying and individual varying vectors are fitted using ivar() and tvar() (see Details). Factors are allowed within ivar() and tvar().

histories

A NAN X NS = (number of animals) X (number of capture occasions) matrix containing capture histories. Capture histories are comprised of 0's, 1',s and 2's. 0 in cell (i,j) means animal i was not captured on occasion j, 1 in cell (i,j) means animal i was captured on occasion j and released live back into the population, 2 in cell (i,j) means animal i was captured on occasion j and was not released back into the population (e.g., it died). Animals with '2' as the last non-zero entry of their history are considered 'censored'. Their lack of capture information is removed from the likelihood after the occasion with the 2. Rows of all zeros (i.e., no captures) are allowed in the history matrix, but do not affect coefficient or population size estimates. A warning is thrown if rows of all zeros exist. Capture and survival probabilities are computed for animals with all zero histories. In this way, it is possible to have the routine compute capture or survival estimates for combinations of covariates that do not exist in the data by associating the covariate combinations with histories that have all zero entries.

cap.init

(optional) Vector of initial values for coefficients in the capture model. One element per covariate in capture. The default value usually works.

sur.init

(optional) Vector or initial values for coefficients in the survival model. One element per covariate in survival. The default value usually works.

group

(optional) A vector of length NAN giving the (non-changing) group membership of every captured animal (e.g., sex). Group is used only for computing TEST 2 and TEST 3. TEST 2 and TEST 3 are computed separately for each group. E.g., if group=sex, TEST 2 and TEST 3 are computed for each sex. TEST 2 and TEST3 are used only to estimate C-hat. See c.hat for pooling rules for these test components to estimate C-hat.

nhat.v.meth

Integer specifying method for computing variance estimates of population size estimates. nhat.v.meth = 1 uses the variance estimator of Taylor et al. 2002, Ursus, p. 188 which is the so-called Huggins variance estimator, and incorporates covariances. nhat.v.meth = 2 uses the variance estimator of Amstrup et al. 2005 (p. 244, Eqn. 9.10), which is the same variance estimator as nhat.v.meth = 1 with more 2nd order approximation terms included. Method 2 should provide better variances than method 1, especially if the coefficient of variation of capture probabilities are >1.0, but method 2 has not been studied as much as method 1. nhat.v.meth = 3 uses the variance estimator of McDonald and Amstrup, 1999, JABES, which is a 1st order approximation that does not incorporate covariances. Method 3 is much faster than methods 1 and 2 and could be easily calculated by hand, but should only be used when there is little capture heterogeneity.

c.hat

External (override) estimate of variance inflation factor (c.hat) to use during estimation. If input value of c.hat is <= 0, MRA computes an estimate of variance inflation based on TEST 2 and TEST 3 applied to groups (if called for, see group above) using Manly, McDonald, and McDonald, 1993, rules for pooling. I.e., all cells in each TEST 2 or TEST 3 Chi-square component table must be >= 5 before that component contributes to the estimate of C-hat. This rules is slightly different than program MARK's pooling rules, so MRA's and MARK's estimates of c.hat will generally be different. If the input c.hat > 0, MRA does not estimate C.hat, and uses the supplied value.

External (override) model degrees of freedom to use during estimation. If df == NA, the number of parameters is estimated from the rank of the matrix of 2nd derivatives or Hessian, depending on cov.meth parameter. If df <= 0, the number of parameters will be set to NX+NY = the number of estimated coefficients. Otherwise, if df > 0, the supplied value is used. Only AIC, QAIC, AICc, and QAICc are dependent on this value (in their penalty terms).

intervals

Time intervals. This is a vector of length ncol(histories)-1 (i.e., number of capture occasions minus 1) specifying relative time intervals between occasions. For example, if capture occasions occurred in 1999, 2000, 2005, and 2007 intervals would be set to c(1,5,2). Estimates of survival are adjusted for time intervals between occasions assuming an exponential lifetime model, i.e., probability of surviving from occasion j to occasion j+1 is Phi(j)^(jth interval length), and it is the Phi(j)'s that are related to covariates through the survival model. In other words, all survival estimates are for an interval of length 1. If an interval of 1 is one year, then all survival estimates will be annual survival, with probability of surviving 2 years equal to annual survival squared, probability of surviving 3 years equal to annual survival cubed, etc.

conf

Confidence level for the confidence intervals placed around estimates of population size. Default 95% confidence.

link

The link function to be used. The link function converts linear predictors in the range (-infinity, infinity) to probabilities in the range (0,1). Valid values for the link function are "logit" (default), "sine", and "hazard". (see Examples for a plot of the link functions)

The "logit" link is $\eta = log( \frac{p}{1 - p} )$ with inverse $p = \frac{1}{1 + exp(-\eta)}$.
The "sine" link is $\eta = \frac{8asin( 2p - 1 )}{\pi}$, which ranges from -4 to 4. The inverse "sine" link is $p = \frac{1 + sin( \eta\pi/8 )}{2}$ for values of $\eta$ between -4 and 4. For values of $\eta$ < -4, $p$ = 0. For values of $\eta$ > 4, $p$ = 1. Scaling of the sine link was chosen to yield coefficients roughly the same magnitude as the logit link.
The "hazard" link is $\eta = log( -log( 1 - p ))$, with inverse $1 - exp( -exp( \eta ))$. The value of $p$ from the inverse hazard link approaches 0 as $\eta$ decreases. For values of $\eta$ > 3, $p$ = 1 for all intents and purposes.

control

A list containing named control parameters for the minimization and estimation process. Control parameters include number of iterations, covariance estimation method, etc. Although the default values work in the vast majority of cases, changes to these variables can effect speed and performance for ill-behaved models. See mra.control() for a description of the individual control parameters.

Value

An object (list) of class c("cjs","cr") with many components. Use print.cr to print it nicely. Use names(fit), where the call was fit <- F.cr.estim(...), to see names of all returned components. To see values of individual components, issue commands like fit$s.hat, fit$se.s.hat, fit$n.hat, etc.

Components of the returned object are as follows:

histories

The input capture history matrix.

aux

Auxiliary information about the fit, mostly stored input values. This is a list containing the following components:

call = original call
nan = number of animals
ns = number of samples = number of capture occasions
nx = number of coefficients in capture model
ny = number of coefficients in survival model
cov.name = names of all coefficients
ic.name = name of capture history matrix.
mra.version = version number of MRA package used to estimate the model
R.version = R version used for during estimation
run.date = date the model was estimated.

loglik

Maximized CJS likelihood value for the model

deviance

Model deviance = -2*loglik. This is relative deviance, see help for F.sat.lik.

aic

AIC for the model = deviance + 2*(df). df is either the estimated number of independent parameters (by default), or NX+NY, or a specified value, depending on the input value of DF parameter.

qaic

QAIC (quasi-AIC) = (deviance / vif) + 2(df)

aicc

AIC with small sample correction = AIC + (2*df*(df+1)) / (nan - df - 1)

qaicc

QAIC with small sample correction = QAIC + (2*df*(df+1))/(nan - df - 1)

vif

Variance inflation factor used = estimate of c.hat = chisq.vif / chisq.df

chisq.vif

Composite Chi-square statistic from Test 2 and Test 3 used to compute vif, based on pooling rules.

vif.df

Degrees of freedom for composite chi-square statistic from Test 2 and Test 3, based on pooling rules.

parameters

Vector of all coefficient estimates, NX capture probability coefficients first, then NY survival coefficients. This vector is length NX+NY regardless of estimated DF.

se.param

Standard error estimates for all coefficients. Length NX+NY.

capcoef

Vector of coefficients in the capture model. Length NX.

se.capcoef

Vector of standard errors for coefficients in capture model. Length NX.

surcoef

Vector of coefficients in the survival model. Length NY.

se.surcoef

Vector of standard errors for coefficients in survival model. Length NY.

covariance

Variance-covariance matrix for the estimated model coefficients. Size (NX+NY) X (NX+NY).

p.hat

Matrix of estimated capture probabilities computed from the model. One for each animal each occasion. Cell (i,j) is estimated capture probability for animal i during capture occasion j. Size NAN X NS. First column corresponding to first capture probability is NA because cannot estimate P1 in a CJS model.

se.p.hat

Matrix of standard errors for estimated capture probabilities. One for each animal each occasion. Size NAN X NS. First column is NA.

s.hat

Matrix of estimated survival probabilities computed from the model. One for each animal each occasion. Size NAN X NS. Cell (i,j) is estimated probability animal i survives from occasion j to j+1. There are only NS-1 intervals between occasions. Last column corresponding to survival between occasion NS and NS+1 is NA.

se.s.hat

Matrix of standard errors for estimated survival probabilities. Size NAN X NS. Last column is NA.

The number of parameters assumed in the model. This value was used in the penalty term of AIC, AICc, QAIC, and QAICc. This value is either the number of independent parameters estimated from the rank of the variance-covariance matrix (by default), or NX+NY, or a specified value, depending on the input value of DF parameter. See F.update.df to update this value after the model is fitted.

df.estimated

The number of parameters estimated from the rank of the variance-covariance matrix. This is stored so that df can be updated using F.update.df.

control

A list containing the input maximization and estimation control parameters.

message

A vector of strings interpreting various codes about the estimation. The messages interpret, in this order, the codes for (1) maximization algorithm used, (2) exit code from the maximization algorithm (interprets exit.code), and (3) covariance matrix code (interprets cov.code).

exit.code

Exit code from the maximization routine. Interpretation for exit.code is in message. Exit codes are as follows:

exit.code = 0: FAILURE: Initial Hessian not positive definite.
exit.code = 1: SUCCESS: Convergence criterion met.
exit.code = 2: FAILURE: G'dX > 0, rounding error.
exit.code = 3: FAILURE: Likelihood evaluated too many times.
exit.code =-1: FAILURE: Unknown optimization algorithm."

cov.code

A code indicating the method used to compute the covariance matrix.

fn.evals

The number of times the likelihood was evaluated prior to exit from the minimization routine. If exit.code = 3, fn.evals equals the maximum set in mra.control. This, in combination with the exit codes and execution time, can help detect non-convergence or bad behavior.

ex.time

Execution time for the maximization routine, in minutes. This is returned for 2 reasons. First, this is useful for benchmarking. Second, in conjunction with exit.code, cov.code, and fn.evals, this could be used to detect ill-behaved or marginally unstable problems, if you know what you are doing. Assuming maxfn is set high in mra.control() (e.g., 1000), if exit.code = 1 but the model takes a long time to execute relative to similarly sized problems, it could indicate unstable or marginally ill-behaved models.

n.hat

Vector of Horvitz-Thompson estimates of population size. The Horvitz-Thompson estimator of size is, $$\hat{N}_{ij} = \sum_{i=1}^{NAN} \frac{h_{ij}}{\hat{p}_{ij}}$$ Length of n.hat = NS. No estimate for first occasion.

se.n.hat

Estimated standard errors for n.hat estimates. Computed using method specified in nhat.v.meth.

n.hat.lower

Lower limit of n.hat.conf percent on n.hat. Length NS.

n.hat.upper

Upper limit of n.hat.conf percent on n.hat. Length NS.

n.hat.conf

Confidence level of intervals on n.hat

nhat.v.meth

Code for method used to compute variance of n.hat

num.caught

Vector of observed number of animals captured each occasion. Length NS.

fitted

Matrix of fitted values for the capture histories. Size NAN X NS. Cell (i,j) is expected value of capture indicator in cell (i,j) of histories matrix.

residuals

Matrix of Pearson residuals defined as, $$r_{ij} = \frac{(h_{ij} - \Psi_{ij})^2}{\Psi_{ij}}$$, where $\Psi_{ij}$ is the expected (or fitted) value for cell (i,j) and $h_{ij}$ is the capture indicator for animal i at occasion j. This matrix has size NAN X NS. See parts pertaining to the "overall test" in documentation for F.cjs.gof for a description of $\Psi_{ij}$.

resid.type

String describing the type of residuals computed. Currently, only Pearson residuals are returned.

Details

This is the work-horse routine for estimating CJS models. It compiles all the covariate matrices, then calls a Fortran routine to maximize the CJS likelihood and perform goodness-of-fit tests. Horvitz-Thompson-type population size estimates are also computed by default.

If control=mra.control(trace=1), a log file, named mra.log, is written to the current directory. This file contains additional details, such as individual Test 2 and Test 3 components, in a semi-friendly format. This file is overwritten each run. See help(mra.control) for more details.

Model Specification: Both the capture and survival model can be specified as any combination of 2-d matrices (time and individual varying covariates), 1-d time varying vectors, 1-d individual varying vectors, 1-d time varying factors, and 1-d individual varying factors.

Specification of time or individual varying effects uses the tvar (for 'time varying') and ivar (for 'individual varying') functions. These functions expand covariate vectors along the appropriate dimension to be 2-d matrices suitable for fitting in the model. ivar expands an individual varying vector to all occasions. tvar expands a time varying covariate to all individuals. To do the expansion, both tvar and ivar need to know the size of the 'other' dimension. Thus, tvar(x,100) specifies a 2-d matrix with size 100 by length(x). ivar(x,100) specifies a 2-d matrix with size length(x) by 100.

For convenience, the 'other' dimension of time or individual varying covariates can be specified as an attribute of the vector. Assuming x is a NS vector and the 'nan' attribute of x has been set as attr(x,"nan") <- NAN, tvar(x,NAN) and tvar(x) are equivalent. Same, but vise-versa, for individual varying covariates (i.e., assign the number of occasions using attr(x,"ns")<-NS). This saves some typing in model specification.

Factors are allowed in ivar and tvar. When a factor is specified, the contr.treatment coding is used. By default, an intercept is assumed and the first level of all factors are dropped from the model (i.e., first levels are the reference levels, the default R action). However, there are applications where more than one level will need to be dropped, and the user has control over this via the drop.levels argument to ivar and tvar. For example, tvar(x,drop.levels=c(1,2)) drops the first 2 levels of factor x. tvar(x,drop.levels=length(levels(x))) does the SAS thing and drops the last level of factor x. If drop.levels is outside the range [1,length(levels(x))] (e.g., negative or 0), no levels of the factor are dropped. If no intercept is fitted in the model, this results in the so-called cell means coding for factors.

Example model specifications: Assume 'age' is a NAN x NS 2-d matrix of ages, 'effort' is a size NS 1-d vector of efforts, and 'sex' is a size NAN 1-d factor of sex designations ('M' and 'F').

capture= ~ 1 : constant effect over all individuals and time (intercept only model)
capture= ~ age : Intercept plus age
capture= ~ age + tvar(effort,NAN) : Intercept plus age plus effort
capture= ~ age + tvar(effort,NAN) + ivar(sex,NS) : Intercept plus age plus effort plus sex. Females (1st level) are the reference.
capture= ~ -1 + ivar(sex,NS,0) : sex as a factor, cell means coding
capture= ~ tvar(as.factor(1:ncol(histories)),nrow(histories),c(1,2)) : time varying effects

Values in 2-d Matrix Covariates: Even though covariate matrices are required to be NAN x NS (same size as capture histories), there are not that many parameters. The first capture probability cannot be estimated in CJS models, and the NS-th survival parameter does not exist. When a covariate matrix appears in the capture model, only values in columns 2:ncol(histories) are used. When a covariate matrix appears in the survival model, only values in columns 1:(ncol(histories)-1) are used. See examples for demonstration.

References

Taylor, M. K., J. Laake, H. D. Cluff, M. Ramsay, and F. Messier. 2002. Managing the risk from hunting for the Viscount Melville Sound polar bear population. Ursus 13:185-202.

Manly, B. F. J., L. L. McDonald, and T. L. McDonald. 1999. The robustness of mark-recapture methods: a case study for the northern spotted owl. Journal of Agricultural, Biological, and Environmental Statistics 4:78-101.

Huggins, R. M. 1989. On the statistical analysis of capture experiments. Biometrika 76:133-140.

Amstrup, S. C., T. L. McDonald, and B. F. J. Manly (editors). 2005. Handbook of Capture-Recapture Analysis. Princeton University Press.

Peterson. 1986. Statistics and Probability Letters. p.227.

McDonald, T. L., and S. C. Amstrup. 2001. Estimation of population size using open capture-recapture models. Journal of Agricultural, Biological, and Environmental Statistics 6:206-220.

Examples

Run this code

# NOT RUN {


## Fit CJS model to dipper data, time-varying capture and survivals.
## Method 1 : using factors
data(dipper.histories)
ct <- as.factor( paste("T",1:ncol(dipper.histories), sep=""))
attr(ct,"nan")<-nrow(dipper.histories)
dipper.cjs <- F.cjs.estim( ~tvar(ct,drop=c(1,2)), ~tvar(ct,drop=c(1,6,7)), dipper.histories )

## Method 2 : same thing using 2-d matrices
xy <- F.cjs.covars( nrow(dipper.histories), ncol(dipper.histories) )
# The following extracts 2-D matrices of 0s and 1s
for(j in 1:ncol(dipper.histories)){ assign(paste("x",j,sep=""), xy$x[,,j]) } 
dipper.cjs <- F.cjs.estim( ~x3+x4+x5+x6+x7, ~x2+x3+x4+x5, dipper.histories )

## Values in the 1st column of capture covariates do not matter
x3.a <- x3
x3.a[,1] <- 999
dipper.cjs2 <- F.cjs.estim( ~x3.a+x4+x5+x6+x7, ~x2+x3+x4+x5, dipper.histories )
# compare dipper.cjs2 to dipper.cjs

## Values in the last column of survival covariates do not matter
x3.a <- x3
x3.a[,ncol(dipper.histories)] <- 999
dipper.cjs2 <- F.cjs.estim( ~x3+x4+x5+x6+x7, ~x2+x3.a+x4+x5, dipper.histories )
# compare dipper.cjs2 to dipper.cjs


## A plot to compare the link functions
sine.link <- function(eta){ ifelse( eta < -4, 0, ifelse( eta > 4, 1, .5*(1+sin(eta*pi/8)))) }
eta <- seq(-5,5, length=40)
p1 <- 1 / (1 + exp(-eta))
p2 <- sine.link(eta)
p3 <- 1.0 - exp( -exp( eta ))
plot(eta, p1, type="l" )
lines(eta, p2, col="red" )
lines(eta, p3, col="blue" )
legend( "topleft", legend=c("logit", "sine", "hazard"), col=c("black", "red", "blue"), lty=1)




# }

Run the code above in your browser using DataLab