modelsearch: Develop Best-fit Vital Rate Estimation Models For Matrix Development

Description

modelsearch() returns both a table of vital rate estimating models and a best-fit model for each major vital rate estimated. The final output can be used as input in other functions within this package.

Usage

modelsearch(
  data,
  historical = TRUE,
  approach = "mixed",
  suite = "size",
  bestfit = "AICc&k",
  vitalrates = c("surv", "size", "fec"),
  surv = c("alive3", "alive2", "alive1"),
  obs = c("obsstatus3", "obsstatus2", "obsstatus1"),
  size = c("sizea3", "sizea2", "sizea1"),
  repst = c("repstatus3", "repstatus2", "repstatus1"),
  fec = c("feca3", "feca2", "feca1"),
  stage = c("stage3", "stage2", "stage1"),
  indiv = "individ",
  patch = NA,
  year = "year2",
  sizedist = "gaussian",
  fecdist = "gaussian",
  size.zero = FALSE,
  fec.zero = FALSE,
  patch.as.random = TRUE,
  year.as.random = TRUE,
  juvestimate = NA,
  juvsize = FALSE,
  fectime = 2,
  censor = NA,
  age = NA,
  indcova = NA,
  indcovb = NA,
  indcovc = NA,
  show.model.tables = TRUE,
  global.only = FALSE,
  quiet = FALSE
)

Arguments

data

The vertical dataset to be used for analysis. This dataset should be of class hfvdata, but can also be a data frame formatted similarly to the output format provided by functions verticalize3() or historicalize3(), as long as all needed variables are properly designated.

historical

A logical variable denoting whether to assess the effects of state in time t-1 in addition to state in time t. Defaults to TRUE.

approach

The statistical approach to be taken for model building. The default is lme4, which uses the mixed model approach utilized in package lme4. Other options include glm, which uses lm, glm, and related functions in base R.

suite

This describes the global model for each vital rate estimation and has the following possible values: full, includes main effects and all two-way interactions of size and reproductive status; main, includes main effects only of size and reproductive status; size, includes only size (also interactions between size in historical model); rep, includes only reproductive status (also interactions between status in historical model); cons, all vital rates estimated only as y-intercepts. If approach = "glm" and year.as.random = FALSE, then year is also included as a fixed effect, and, in the case of full, included in two-way interactions. Defaults to size.

bestfit

A variable indicating the model selection criterion for the choice of best-fit model. The default is AICc&k, which chooses the best-fit model as the model with the lowest AICc or, if not the same model, then the model that has the lowest degrees of freedom among models with \(\Delta AICc <= 2.0\). Alternatively, AICc may be chosen, in which case the best-fit model is simply the model with the lowest AICc value.

vitalrates

A vector describing which vital rates will be estimated via linear modeling, with the following options: surv, survival probability; obs, observation probability; size, overall size; repst, probability of reproducing; and fec, amount of reproduction (overall fecundity). Defaults to c("surv", "size", "fec").

surv

A vector indicating the variable names coding for status as alive or dead in times t+1, t, and t-1, respectively. Defaults to c("alive3", "alive2", "alive1").

obs

A vector indicating the variable names coding for observation status in times t+1, t, and t-1, respectively. Defaults to c("obsstatus3", "obsstatus2", "obsstatus1").

size

A vector indicating the variable names coding for size in times t+1, t, and t-1, respectively. Defaults to c("sizea3", "sizea2", "sizea1").

repst

A vector indicating the variable names coding for reproductive status in times t+1, t, and t-1, respectively. Defaults to c("repstatus3", "repstatus2", "repstatus1").

fec

A vector indicating the variable names coding for fecundity in times t+1, t, and t-1, respectively. Defaults to c("feca3", "feca2", "feca1").

stage

A vector indicating the variables coding for stage in times t+1, t, and t-1. Defaults to c("stage3", "stage2", "stage1").

indiv

A variable indicating the variable name coding individual identity. Defaults to individ.

patch

A variable indicating the variable name coding for patch, where patches are defined as permanent subgroups within the study population. Defaults to NA.

year

A variable indicating the variable coding for observation time in time t. Defaults to year2.

sizedist

The probability distribution used to model size. Options include gaussian for the Normal distribution (default), poisson for the Poisson distribution, and negbin for the negative binomial distribution.

fecdist

The probability distribution used to model fecundity. Options include gaussian for the Normal distribution (default), poisson for the Poisson distribution, and negbin for the negative binomial distribution.

size.zero

A logical variable indicating whether size distribution should be zero-inflated. Only applies to Poisson and negative binomial distributions. Defaults to FALSE.

fec.zero

A logical variable indicating whether fecundity distribution should be zero-inflated. Only applies to Poisson and negative binomial distributions. Defaults to FALSE.

patch.as.random

If set to TRUE and approach = "lme4", then patch is included as a random factor. If set to FALSE and approach = "glm", then patch is included as a fixed factor. All other combinations of logical value and approach lead to patch not being included in modeling. Defaults to TRUE.

year.as.random

If set to TRUE and approach = "lme4", then year is included as a random factor. If set to FALSE, then year is included as a fixed factor. All other combinations of logical value and approach lead to year not being included in modeling. Defaults to TRUE.

juvestimate

An optional variable denoting the stage name of the juvenile stage in the vertical dataset. If not NA, and stage is also given (see below), then vital rates listed in vitalrates other than fec will also be estimated from the juvenile stage to all adult stages. Defaults to NA, in which case juvenile vital rates are not estimated.

juvsize

A logical variable denoting whether size should be used as a term in models involving transition from the juvenile stage. Defaults to FALSE, and is only used if juvestimate does not equal NA.

fectime

A variable indicating which year of fecundity to use as the response term in fecundity models. Options include 2, which refers to time t, and 3, which refers to time t+1. Defaults to 2.

censor

A vector denoting the names of censoring variables in the dataset, in order from time t+1, followed by time t, and lastly followed by time t-1. Defaults to NA.

age

Designates the name of the variable corresponding to age in the vertical dataset. Defaults to NA, in which case age is not included in linear models. Should only be used if building age x stage matrices.

indcova

Vector designating the names in times t+1, t, and t-1 of an individual covariate. Defaults to NA.

indcovb

Vector designating the names in times t+1, t, and t-1 of an individual covariate. Defaults to NA.

indcovc

Vector designating the names in times t+1, t, and t-1 of an individual covariate. Defaults to NA.

show.model.tables

If set to TRUE, then includes full modeling tables in the output. Defaults to TRUE.

global.only

If set to TRUE, then only global models will be built and evaluated. Defaults to FALSE.

quiet

If set to TRUE, then model building and selection will proceed without warnings and diagnostic messages being issued. Note that this will not affect warnings and messages generated as models themselves are tested. Defaults to FALSE.

Value

This function yields an object of class lefkoMod, which is a list in which the first 9 elements are the best-fit models for survival, observation status, size, reproductive status, fecundity, juvenile survival, juvenile observation, juvenile size, and juvenile transition to reproduction, respectively, followed by 9 elements corresponding to the model tables for each of these vital rates, in order, followed by a single character element denoting the criterion used for model selection, and ending on a quality control vector:

survival_model

Best-fit model of the binomial probability of survival from time t to time t+1. Defaults to 1.

observation_model

Best-fit model of the binomial probability of observation in time t+1 given survival to that time. Defaults to 1.

size_model

Best-fit model of size in time t+1 given survival to and observation in that time. Defaults to 1.

repstatus_model

Best-fit model of the binomial probability of reproduction in time t+1, given survival to and observation in that time. Defaults to 1.

fecundity_model

Best-fit model of fecundity in time t+1 given survival to, and observation and reproduction in that time. Defaults to 1.

juv_survival_model

Best-fit model of the binomial probability of survival from time t to time t+1 of an immature individual. Defaults to 1.

juv_observation_model

Best-fit model of the binomial probability of observation in time t+1 given survival to that time of an immature individual. Defaults to 1.

juv_size_model

Best-fit model of size in time t+1 given survival to and observation in that time of an immature individual. Defaults to 1.

juv_reproduction_model

Best-fit model of the binomial probability of reproduction in time t+1, given survival to and observation in that time of an individual that was immature in time t. This model is technically not a model of reproduction probability for individuals that are immature, rather reproduction probability here is given for individuals that are mature in time t+1 but immature in time t. Defaults to 1.

survival_table

Full dredge model table of survival probability.

observation_table

Full dredge model table of observationprobability.

size_table

Full dredge model table of size.

repstatus_table

Full dredge model table of reproduction probability.

fecundity_table

Full dredge model table of fecundity.

juv_survival_table

Full dredge model table of immature survival probability.

juv_observation_table

Full dredge model table of immature observation probability.

juv_size_table

Full dredge model table of immature size.

juv_reproduction_table

Full dredge model table of immature reproduction probability.

criterion

Vharacter variable denoting the criterion used to determine the best-fit model.

Data frame with three variables: 1) Name of vital rate, 2) number of individuals used to model that vital rate, and 3) number of individual transitions used to model that vital rate.

The mechanics governing model building are fairly robust to errors and exceptions. The function attempts to build global models, and simplifies models automatically should model building fail. Model building proceeds through the functions lm() (GLM with Gaussian response), glm() (GLM with Poisson or binomial response), glm.nb() (GLM with negative binomial response), zeroinfl() (zero-inflated Poisson or negative binomial response), lmer() (mixed model with Gaussian response), glmer() (mixed model with binomial or Poisson response), glmmTMB() (mixed model with negative binomial, zero-inflated negative binomial, or zero-inflated Poisson response). See documentation related to these functions for further information.

Exhaustive model building and selection proceeds via the dredge() function in package MuMIn. This function is verbose, so that any errors and warnings developed during model building, model analysis, and model selection can be found and dealt with. Interpretations of errors during global model analysis may be found in documentation in for the functions and packages mentioned. Package MuMIn is used for model dredging (see dredge), and errors and warnings during dredging can be interpreted using the documentation for that package. Errors occurring during dredging lead to the adoption of the global model as the best-fit, and the user should view all logged errors and warnings to determine the best way to proceed. The quiet = TRUE option can be used to silence dredge warnings, but users should note that automated model selection can be viewed as a black box, and so great care should be taken to ensure that the models run make biological sense, and that model quality is prioritized.

Exhaustive model selection through dredging works best with larger datasets and fewer tested parameters. Setting suite = "full" may initiate a dredge that takes a dramatically long time, particularly if the model is historical, individual covariates are used, or a zero-inflated distribution is assumed. In such cases, the number of models built and tested will run at least in the millions. Small datasets will also increase the error associated with these tests, leading to adoption of simpler models overall. We do not yet offer a parallelization option for function modelsearch(), but plan to offer one in the future to speed this process up for particularly large global models.

Care must be taken to build models that test the impacts of state in time t-1 for historical models, and that do not test these impacts for ahistorical models. Ahistorical matrix modeling particularly will yield biased transition estimates if historical terms from models are ignored. This can be dealt with at the start of modeling by setting historical = FALSE for the ahistorical case, and historical = TRUE for the historical case.

Examples

Run this code

# NOT RUN {
data(lathyrus)

sizevector <- c(0, 4.6, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9)
stagevector <- c("Sd", "Sdl", "Dorm", "Sz1nr", "Sz2nr", "Sz3nr", "Sz4nr", "Sz5nr",
                 "Sz6nr", "Sz7nr", "Sz8nr", "Sz9nr", "Sz1r", "Sz2r", "Sz3r", "Sz4r",
                 "Sz5r", "Sz6r", "Sz7r", "Sz8r", "Sz9r")
repvector <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1)
obsvector <- c(0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
matvector <- c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
immvector <- c(1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
propvector <- c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
indataset <- c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
binvec <- c(0, 4.6, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
            0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5)

lathframeln <- sf_create(sizes = sizevector, stagenames = stagevector, repstatus = repvector,
                         obsstatus = obsvector, matstatus = matvector, immstatus = immvector,
                         indataset = indataset, binhalfwidth = binvec, propstatus = propvector)

lathvertln <- verticalize3(lathyrus, noyears = 4, firstyear = 1988, patchidcol = "SUBPLOT",
                           individcol = "GENET", blocksize = 9, juvcol = "Seedling1988",
                           sizeacol = "lnVol88", repstracol = "FCODE88",
                           fecacol = "Intactseed88", deadacol = "Intactseed88",
                           nonobsacol = "Dormant1988", stageassign = lathframeln,
                           stagesize = "sizea", censorcol = "Missing1988",
                           censorkeep = NA, NAas0 = TRUE, censor = TRUE)

lathvertln$feca2 <- round(lathvertln$feca2)
lathvertln$feca1 <- round(lathvertln$feca1)
lathvertln$feca3 <- round(lathvertln$feca3)

lathmodelsln2 <- modelsearch(lathvertln, historical = FALSE, approach = "lme4", suite = "main",
                             vitalrates = c("surv", "obs", "size", "repst", "fec"), 
                             juvestimate = "Sdl", bestfit = "AICc&k", sizedist = "gaussian", 
                             fecdist = "poisson", indiv = "individ", patch = "patchid", 
                             year = "year2", year.as.random = TRUE, patch.as.random = TRUE,
                             show.model.tables = TRUE)

lathmodelsln2
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab