process.data: Process encounter history dataframe for MARK analysis

Description

Prior to analyzing the data, this function initializes several variables (e.g., number of capture occasions, time intervals) that are often specific to the capture-recapture model being fitted to the data. It also is used to 1) define groups in the data that represent different levels of one or more factor covariates (e.g., sex), 2) define time intervals between capture occasions (if not 1), and 3) create an age structure for the data, if any.

Usage

process.data(data, begin.time = 1, model = "CJS",
    mixtures = 1, groups = NULL, allgroups = FALSE,
    age.var = NULL, initial.ages = c(0), age.unit = 1,
    time.intervals = NULL, nocc = NULL,
    strata.labels = NULL, counts = NULL, reverse = FALSE)

Arguments

data

A data frame with at least one field named ch which is the capture (encounter) history stored as a character string. data can also have a field freq which is the number of animals with that capture histor

begin.time

Time of first capture occasion or vector of times if different for each group

model

Type of analysis model. See mark for a list of possible values for model

mixtures

Number of mixtures in closed capture models with heterogeneity

groups

Vector of factor variable names (in double quotes) in data that will be used to create groups in the data. A group is created for each unique combination of the levels of the factor variables in the list.

allgroups

Logical variable; if TRUE, all groups are created from factors defined in groups even if there are no observations in the group

age.var

An index in vector groups for a variable (if any) for age

initial.ages

A vector of initial ages that contains a value for each level of the age variable groups[age.var]

age.unit

Increment of age for each increment of time as defined by time.intervals

time.intervals

Vector of lengths of time between capture occasions

nocc

number of occasions for Nest type; either nocc or time.intervals must be specified

strata.labels

vector of single character values used in capture history(ch) for ORDMS models; it can contain one more value beyond what is in ch for an unobservable state

counts

named list of numeric vectors (one group) or matrices (>1 group) containing counts for mark-resight models

reverse

if set to TRUE, will reverse timing of transition (Psi) and survival (S) in Multistratum models

Value

processed.data (a list with the following elements)
dataoriginal raw dataframe with group factor variable added if groups were defined
modeltype of analysis model (eg, "CJS", "Burnham", "Barker")
freqa dataframe of frequencies (same number of rows as data, number of columns is the number of groups in the data. The column names are the group labels representing the unique groups that have one or more capture histories.
noccnumber of capture occasions
time.intervalslength of time intervals between capture occasions
begin.timetime of first capture occasion
age.unitincrement of age for each increment of time
initial.agesan initial age for each group in the data; Note that this is not the original argument but is a vector with the initial age for each group. In the first example below proc.example.data$initial.ages is a vector with 16 elements as follows 0 1 1 2 0 1 1 2 0 1 1 2 0 1 1 2
nstratanumber of strata in Multistrata models
strata.labelsvector of alphabetic characters used to identify strata in Multistrata models
group.covariatesfactor covariates used to define groups

Details

For examples of data, see dipper,edwards.eberhardt,example.data. The structure of the encounter history and the analysis depends on the analysis model to some extent. Thus, it is necessary to process a dataframe with the encounter history (ch) and a chosen model to define the relevant values. For example, number of capture occasions (nocc) is automatically computed based on the length of the encounter history (ch) in data; however, this is dependent on the type of analysis model. For models such as "CJS", "Pradel" and others, it is simply the length of ch. Whereas, for "Burnham" and "Barker" models,the encounter history contains both capture and resight/recovery values so nocc is one-half the length of ch. Likewise, the number of time.intervals depends on the model. For models, such as "CJS", "Pradel" and others, the number of time.intervals is nocc-1; whereas, for capture&recovery(resight) models the number of time.intervals is nocc. The default time interval is unit time (1) and if this is adequate, the function will assign the appropriate length. A processed data frame can only be analyzed using the model that was specified. The model value is used by the functions make.design.data, add.design.data, and make.mark.model to define the model structure as it relates to the data. Thus, if the data are going to be analysed with different underlying models, create different processed data sets with the model name as an extension. For example, dipper.cjs=process.data(dipper) and dipper.popan=process.data(dipper,model="POPAN"). This function will report inconsistencies in the lengths of the capture history values and when invalid entries are given in the capture history. For example, with the "CJS" model, the capture history should only contain 0 and 1 whereas for "Barker" it can contain 0,1,2. For "Multistrata" models, the code will automatically identify the number of strata and strata labels based on the unique alphabetic codes used in the capture histories. The argument begin.time specifies the time for the first capture occasion. This is used in creating the levels of the time factor variable in the design data and for labelling parameters. If the begin.time varies by group, enter a vector of times with one for each group. Note that the time values for survivals are based on the beginning of the survival interval and capture probabilities are labeled based on the time of the capture occasion. Likewise, age labels for survival are the ages at the beginning times of the intervals and for capture probabilities it is the age at the time of capture/recapture. groups is a vector of variable names that are contained in data. Each must be a factor variable. A group is created for each unique combination of the levels of the factor variables. In the first example given below groups=c("sex","age","region"). which creates groups defined by the levels of sex, age and region. There should be 2(sexes)*3(ages)*4(regions)=24 groups but in actuality there are only 16 in the data because there are only 2 age groups for each sex. Age group 1 and 2 for M and age groups 2 and 3 for F. This was done to demonstrate that the code will only use groups that have 1 or more capture histories unless allgroups=TRUE. The argument age.var=2 specifies that the second grouping variable in groups represents an age variable. It could have been named something different than age. If a variable in groups is named age then it is not necessary to specify age.var. initial.age specifies that the age at first capture of the age levels is 0,1 and 2 while the age classes were designated as 1,2,3. The actual ages for the age classes do not have to be sequential or ordered, but ordering will cause less confusion. Thus levels 1,2,3 could represent initial ages of 0,4,6 or 6,0,4. The argument age.unit is the amount an animal ages for each unit of time and the default is 1. The default for initial.age is 0 for each group, in which case, age represents time since marking (first capture) rather than the actual age of the animal.

Examples

Run this code

data(example.data)
proc.example.data=process.data(data=example.data,begin.time=1980,
groups=c("sex","age","region"),
age.var=2,initial.age=c(0,1,2))

data(dipper)
dipper.process=process.data(dipper)

Run the code above in your browser using DataLab