MIfitHMM: Fit HMMs to multiple imputation data

Description

Fit a (multivariate) hidden Markov model to multiple imputation data. Multiple imputation is a method for accommodating missing data, temporal-irregularity, or location measurement error in hidden Markov models, where pooled parameter estimates reflect uncertainty attributable to observation error.

Usage

MIfitHMM(miData, ...)
# S3 method for default
MIfitHMM(
  miData,
  nSims,
  ncores = 1,
  poolEstimates = TRUE,
  alpha = 0.95,
  na.rm = FALSE,
  nbStates,
  dist,
  Par0,
  beta0 = NULL,
  delta0 = NULL,
  estAngleMean = NULL,
  circularAngleMean = NULL,
  formula = ~1,
  formulaDelta = NULL,
  stationary = FALSE,
  mixtures = 1,
  formulaPi = NULL,
  nlmPar = NULL,
  fit = TRUE,
  useInitial = FALSE,
  DM = NULL,
  userBounds = NULL,
  workBounds = NULL,
  betaCons = NULL,
  betaRef = NULL,
  deltaCons = NULL,
  mvnCoords = NULL,
  stateNames = NULL,
  knownStates = NULL,
  fixPar = NULL,
  retryFits = 0,
  retrySD = NULL,
  optMethod = "nlm",
  control = list(),
  prior = NULL,
  modelName = NULL,
  covNames = NULL,
  spatialCovs = NULL,
  centers = NULL,
  centroids = NULL,
  angleCovs = NULL,
  altCoordNames = NULL,
  method = "IS",
  parIS = 1000,
  dfSim = Inf,
  grid.eps = 1,
  crit = 2.5,
  scaleSim = 1,
  quad.ask = FALSE,
  force.quad = TRUE,
  fullPost = TRUE,
  dfPostIS = Inf,
  scalePostIS = 1,
  thetaSamp = NULL,
  ...
)
# S3 method for hierarchical
MIfitHMM(
  miData,
  nSims,
  ncores = 1,
  poolEstimates = TRUE,
  alpha = 0.95,
  na.rm = FALSE,
  hierStates,
  hierDist,
  Par0,
  hierBeta = NULL,
  hierDelta = NULL,
  estAngleMean = NULL,
  circularAngleMean = NULL,
  hierFormula = NULL,
  hierFormulaDelta = NULL,
  mixtures = 1,
  formulaPi = NULL,
  nlmPar = NULL,
  fit = TRUE,
  useInitial = FALSE,
  DM = NULL,
  userBounds = NULL,
  workBounds = NULL,
  betaCons = NULL,
  deltaCons = NULL,
  mvnCoords = NULL,
  knownStates = NULL,
  fixPar = NULL,
  retryFits = 0,
  retrySD = NULL,
  optMethod = "nlm",
  control = list(),
  prior = NULL,
  modelName = NULL,
  covNames = NULL,
  spatialCovs = NULL,
  centers = NULL,
  centroids = NULL,
  angleCovs = NULL,
  altCoordNames = NULL,
  method = "IS",
  parIS = 1000,
  dfSim = Inf,
  grid.eps = 1,
  crit = 2.5,
  scaleSim = 1,
  quad.ask = FALSE,
  force.quad = TRUE,
  fullPost = TRUE,
  dfPostIS = Inf,
  scalePostIS = 1,
  thetaSamp = NULL,
  ...
)

Value

If nSims>1, poolEstimates=TRUE, and fit=TRUE, a miHMM object, i.e., a list consisting of:

miSum: miSum object returned by MIpool.
HMMfits: List of length nSims comprised of momentuHMM objects.

If poolEstimates=FALSE and fit=TRUE, a list of length nSims consisting of momentuHMM objects is returned.

However, if fit=FALSE and miData is a crwData

object, then MIfitHMM returns a crwSim object, i.e., a list containing miData (a list of momentuHMMData objects) and crwSimulator (a list of crwSimulator objects),and most other arguments related to fitHMM are ignored.

Arguments

miData: A crwData object, a crwHierData object, a crwSim object, a crwHierSim object, a list of momentuHMMData objects, or a list of momentuHierHMMData objects.
...: further arguments passed to or from other methods
nSims: Number of imputations in which to fit the HMM using fitHMM. If miData is a list of momentuHMMData objects, nSims cannot exceed the length of miData.
ncores: Number of cores to use for parallel processing. Default: 1 (no parallel processing).
poolEstimates: Logical indicating whether or not to calculate pooled parameter estimates across the nSims imputations using MIpool. Default: TRUE.
alpha: Significance level for calculating confidence intervals of pooled estimates when poolEstimates=TRUE (see MIpool). Default: 0.95.
na.rm: Logical indicating whether or not to exclude model fits with NA parameter estimates or standard errors from pooling when poolEstimates=TRUE (see MIpool). Default: FALSE.
nbStates: Number of states of the HMM. See fitHMM.
dist: A named list indicating the probability distributions of the data streams. See fitHMM.
Par0: A named list containing vectors of initial state-dependent probability distribution parameters for each data stream specified in dist. See fitHMM. Par0 may also be a list of length nSims, where each element is a named list containing vectors of initial state-dependent probability distribution parameters for each imputation. Note that if useInitial=TRUE then Par0 is ignored after the first imputation.
beta0: Initial matrix of regression coefficients for the transition probabilities. See fitHMM. beta0 may also be a list of length nSims, where each element is an initial matrix of regression coefficients for the transition probabilities for each imputation.
delta0: Initial values for the initial distribution of the HMM. See fitHMM. delta0 may also be a list of length nSims, where each element is the initial values for the initial distribution of the HMM for each imputation.
estAngleMean: An optional named list indicating whether or not to estimate the angle mean for data streams with angular distributions ('vm' and 'wrpcauchy'). See fitHMM.
circularAngleMean: An optional named list indicating whether to use circular-linear or circular-circular regression on the mean of circular distributions ('vm' and 'wrpcauchy') for turning angles. See fitHMM.
formula: Regression formula for the transition probability covariates. See fitHMM.
formulaDelta: Regression formula for the initial distribution. See fitHMM.
stationary: FALSE if there are time-varying covariates in formula or any covariates in formulaDelta. If TRUE, the initial distribution is considered equal to the stationary distribution. See fitHMM.
mixtures: Number of mixtures for the state transition probabilities (i.e. discrete random effects *sensu* DeRuiter et al. 2017). Default: mixtures=1.
formulaPi: Regression formula for the mixture distribution probabilities. See fitHMM.
nlmPar: List of parameters to pass to the optimization function nlm (which should be either print.level, gradtol, stepmax, steptol, iterlim, or hessian -- see nlm's documentation for more detail). For print.level, the default value of 0 means that no printing occurs, a value of 1 means that the first and last iterations of the optimization are detailed, and a value of 2 means that each iteration of the optimization is detailed. Ignored unless optMethod="nlm".
fit: TRUE if the HMM should be fitted to the data, FALSE otherwise. See fitHMM. If fit=FALSE and miData is a crwData object, then MIfitHMM returns a list containing a momentuHMMData object (if nSims=1) or, if nSims>1, a crwSim object.
useInitial: Logical indicating whether or not to use parameter estimates for the first model fit as initial values for all subsequent model fits. If ncores>1 then the first model is fit on a single core and then used as the initial values for all subsequent model fits on each core (in this case, the progress of the initial model fit can be followed using the print.level option in nlmPar). Default: FALSE. Ignored if nSims<2.
DM: An optional named list indicating the design matrices to be used for the probability distribution parameters of each data stream. See fitHMM.
userBounds: An optional named list of 2-column matrices specifying bounds on the natural (i.e, real) scale of the probability distribution parameters for each data stream. See fitHMM.
workBounds: An optional named list of 2-column matrices specifying bounds on the working scale of the probability distribution, transition probability, and initial distribution parameters. See fitHMM.
betaCons: Matrix of the same dimension as beta0 composed of integers identifying any equality constraints among the t.p.m. parameters. See fitHMM.
betaRef: Numeric vector of length nbStates indicating the reference elements for the t.p.m. multinomial logit link. See fitHMM.
deltaCons: Matrix of the same dimension as delta0 composed of integers identifying any equality constraints among the initial distribution working scale parameters. Ignored unless a formula is provided in formulaDelta. See fitHMM.
mvnCoords: Character string indicating the name of location data that are to be modeled using a multivariate normal distribution. For example, if mu="mvnorm2" was included in dist and (mu.x, mu.y) are location data, then mvnCoords="mu" needs to be specified in order for these data to be properly treated as locations in functions such as plot.momentuHMM, plot.miSum, plot.miHMM, plotSpatialCov, and MIpool.
stateNames: Optional character vector of length nbStates indicating state names.
knownStates: Vector of values of the state process which are known prior to fitting the model (if any). See fitHMM. If miData is a list of momentuHMMData objects, then knownStates can alternatively be a list of vectors containing the known values for the state process for each element of miData.
fixPar: An optional list of vectors indicating parameters which are assumed known prior to fitting the model. See fitHMM.
retryFits: Non-negative integer indicating the number of times to attempt to iteratively fit the model using random perturbations of the current parameter estimates as the initial values for likelihood optimization. See fitHMM.
retrySD: An optional list of scalars or vectors indicating the standard deviation to use for normal perturbations of each working scale parameter when retryFits>0. See fitHMM.
optMethod: The optimization method to be used. Can be ``nlm'' (the default; see nlm), ``Nelder-Mead'' (see optim), or ``SANN'' (see optim).
control: A list of control parameters to be passed to optim (ignored unless optMethod="Nelder-Mead" or optMethod="SANN").
prior: A function that returns the log-density of the working scale parameter prior distribution(s). See fitHMM.
modelName: An optional character string providing a name for the fitted model. If provided, modelName will be returned in print.momentuHMM, AIC.momentuHMM, AICweights, and other functions.
covNames: Names of any covariates in miData$crwPredict (if miData is a crwData object; otherwise covNames is ignored). See prepData. If miData is a crwData object, any covariate in miData$crwPredict that is used in formula, formulaDelta, formulaPi, or DM must be included in covNames.
spatialCovs: List of raster layer(s) for any spatial covariates. See prepData.
centers: 2-column matrix providing the x-coordinates (column 1) and y-coordinates (column 2) for any activity centers (e.g., potential centers of attraction or repulsion) from which distance and angle covariates will be calculated based on realizations of the position process. See prepData. Ignored unless miData is a crwData object.
centroids: List where each element is a data frame containing the x-coordinates ('x'), y-coordinates ('y'), and times (with user-specified name, e.g., `time') for centroids (i.e., dynamic activity centers where the coordinates can change over time) from which distance and angle covariates will be calculated based on the location data. See prepData. Ignored unless miData is a crwData object.
angleCovs: Character vector indicating the names of any circular-circular regression angular covariates in miData$crwPredict that need conversion from standard direction (in radians relative to the x-axis) to turning angle (relative to previous movement direction) See prepData. Ignored unless miData is a crwData or crwHierData object.
altCoordNames: Character string indicating an alternative name for the returned location data. See prepData. Ignored unless miData is a crwData or crwHierData object.
method: Method for obtaining weights for movement parameter samples. See crwSimulator. Ignored unless miData is a crwData object.
parIS: Size of the parameter importance sample. See crwSimulator. Ignored unless miData is a crwData object.
dfSim: Degrees of freedom for the t approximation to the parameter posterior. See 'df' argument in crwSimulator. Ignored unless miData is a crwData object.
grid.eps: Grid size for method="quadrature". See crwSimulator. Ignored unless miData is a crwData object.
crit: Criterion for deciding "significance" of quadrature points (difference in log-likelihood). See crwSimulator. Ignored unless miData is a crwData object.
scaleSim: Scale multiplier for the covariance matrix of the t approximation. See 'scale' argument in crwSimulator. Ignored unless miData is a crwData object.
quad.ask: Logical, for method='quadrature'. Whether or not the sampler should ask if quadrature sampling should take place. It is used to stop the sampling if the number of likelihood evaluations would be extreme. Default: FALSE. Ignored if ncores>1.
force.quad: A logical indicating whether or not to force the execution of the quadrature method for large parameter vectors. See crwSimulator. Default: TRUE. Ignored unless miData is a crwData object and method=``quadrature''.
fullPost: Logical indicating whether to draw parameter values as well to simulate full posterior. See crwPostIS. Ignored unless miData is a crwData object.
dfPostIS: Degrees of freedom for multivariate t distribution approximation to parameter posterior. See 'df' argument in crwPostIS. Ignored unless miData is a crwData object.
scalePostIS: Extra scaling factor for t distribution approximation. See 'scale' argument in crwPostIS. Ignored unless miData is a crwData object.
thetaSamp: If multiple parameter samples are available in crwSimulator objects, setting thetaSamp=n will use the nth sample. Defaults to the last. See crwSimulator and crwPostIS. Ignored unless miData is a crwData object.
hierStates: A hierarchical model structure Node for the states. See fitHMM.
hierDist: A hierarchical data structure Node for the data streams. See fitHMM.
hierBeta: A hierarchical data structure Node for the matrix of initial values for the regression coefficients of the transition probabilities at each level of the hierarchy ('beta'). See fitHMM.
hierDelta: A hierarchical data structure Node for the matrix of initial values for the regression coefficients of the initial distribution at each level of the hierarchy ('delta'). See fitHMM.
hierFormula: A hierarchical formula structure for the transition probability covariates for each level of the hierarchy. See fitHMM.
hierFormulaDelta: A hierarchical formula structure for the initial distribution covariates for each level of the hierarchy ('formulaDelta'). Default: NULL (no covariate effects and fixPar$delta is specified on the working scale). See fitHMM.

Details

miData can either be a crwData or crwHierData object (as returned by crawlWrap), a crwSim or crwHierSim object (as returned by MIfitHMM when fit=FALSE), or a list of momentuHMMData or momentuHierHMMData objects (e.g., each element of the list as returned by prepData).

If miData is a crwData (or crwHierData) object, MIfitHMM uses a combination of crwSimulator, crwPostIS, prepData, and fitHMM to draw nSims realizations of the position process and fit the specified HMM to each imputation of the data. The vast majority of MIfitHMM arguments are identical to the corresponding arguments from these functions.

If miData is a crwData or crwHierData object, nSims determines both the number of realizations of the position process to draw (using crwSimulator and crwPostIS) as well as the number of HMM fits.

If miData is a crwSim (or crwHierSim) object or a list of momentuHMMData (or momentuHierHMMData) object(s), the specified HMM will simply be fitted to each of the momentuHMMData (or momentuHierHMMData) objects and all arguments related to crwSimulator, crwPostIS, or prepData are ignored.

References

Hooten M.B., Johnson D.S., McClintock B.T., Morales J.M. 2017. Animal Movement: Statistical Models for Telemetry Data. CRC Press, Boca Raton.

McClintock B.T. 2017. Incorporating telemetry error into hidden Markov movement models using multiple imputation. Journal of Agricultural, Biological, and Environmental Statistics.

Examples

Run this code

# \dontshow{
set.seed(1,kind="Mersenne-Twister",normal.kind="Inversion")
# }
# Don't run because it takes too long on a single core
if (FALSE) {
# extract simulated obsData from example data
obsData <- miExample$obsData

# error ellipse model
err.model <- list(x= ~ ln.sd.x - 1, y =  ~ ln.sd.y - 1, rho =  ~ error.corr)

# create crwData object by fitting crwMLE models to obsData and predict locations 
# at default intervals for both individuals
crwOut <- crawlWrap(obsData=obsData,
         theta=c(4,0),fixPar=c(1,1,NA,NA),
         err.model=err.model)

# HMM specifications
nbStates <- 2
stepDist <- "gamma"
angleDist <- "vm"
mu0 <- c(20,70)
sigma0 <- c(10,30)
kappa0 <- c(1,1)
stepPar0 <- c(mu0,sigma0)
anglePar0 <- c(-pi/2,pi/2,kappa0)
formula <- ~cov1+cos(cov2)
nbCovs <- 2
beta0 <- matrix(c(rep(-1.5,nbStates*(nbStates-1)),rep(0,nbStates*(nbStates-1)*nbCovs)),
                nrow=nbCovs+1,byrow=TRUE)

# first fit HMM to best predicted position process
bestData<-prepData(crwOut,covNames=c("cov1","cov2"))
bestFit<-fitHMM(bestData,
                nbStates=nbStates,dist=list(step=stepDist,angle=angleDist),
                Par0=list(step=stepPar0,angle=anglePar0),beta0=beta0,
                formula=formula,estAngleMean=list(angle=TRUE))
            
print(bestFit)

# extract estimates from 'bestFit'
bPar0 <- getPar(bestFit)

# Fit nSims=5 imputations of the position process
miFits<-MIfitHMM(miData=crwOut,nSims=5,
                  nbStates=nbStates,dist=list(step=stepDist,angle=angleDist),
                  Par0=bPar0$Par,beta0=bPar0$beta,delta0=bPar0$delta,
                  formula=formula,estAngleMean=list(angle=TRUE),
                  covNames=c("cov1","cov2"))

# print pooled estimates
print(miFits)
}

Run the code above in your browser using DataLab