simulate_nhmm: Simulate Non-homogeneous Hidden Markov Models

Description

Simulate sequences of observed and hidden states given the parameters of a non-homogeneous hidden Markov model.

Usage

simulate_nhmm(
  n_states,
  emission_formula,
  initial_formula = ~1,
  transition_formula = ~1,
  data,
  id,
  time,
  coefs = NULL,
  init_sd = 2 * is.null(coefs),
  check_rank = NULL
)

Value

A list with the model used in simulation as well as the simulated hidden state sequences.

Arguments

n_states: An integer > 1 defining the number of hidden states.
emission_formula: of class formula() for the state emission probabilities, or a list of such formulas in case of multiple response variables. The left-hand side of formulas define the responses. For multiple responses having same formula, you can use a form c(y1, y2) ~ x, where y1 and y2 are the response variables.
initial_formula: of class formula() for the initial state probabilities. Left-hand side of the formula should be empty.
transition_formula: of class formula() for the state transition probabilities. Left-hand side of the formula should be empty.
data: A data frame containing the variables used in the model formulas. Note that this should also include also the response variable(s), which are used to define the number of observed symbols (using levels()) and the length of sequences. The actual values of the response variables does not matter though, as they are replaced by the simulated values. The exception is the first time point in FAN-HMM case: If the emission_formula contains lagged responses, the response variable values at the first time point are used to define the emissions at the second time point, and the simulations are done from the second time point onward. This matches the case prior_obs = "fixed" in estimate_nhmm(). Note that compared to estimate_* functions, unused factor levels are not automatically dropped from data.
id: Name of the id variable in data identifying different sequences.
time: Name of the time index variable in data.
coefs: Same as argument inits in estimate_nhmm(). If NULL, (default), the model parameters are generated randomly. If you want to simulate new sequences based on an estimated model fit, you can use coefs = fit$etas and init_sd = 0.
init_sd: Standard deviation of the normal distribution used to generate random coefficients. Default is 2 when coefs is NULL and 0 otherwise.
check_rank: If TRUE, the rank of the design matrices are checked for identifiability issues. Default is NULL, in which case checks are performed only if the number of sequences is 1000 or less, as the QR decomposition quickly becomes computationally demanding. If check is not performed, a warning is given, which can be circumvented by explicitly using check_rank = FALSE.