Simulate sequences of observed and hidden states given the parameters of a non-homogeneous hidden Markov model.
simulate_nhmm(
n_states,
emission_formula,
initial_formula = ~1,
transition_formula = ~1,
data,
id,
time,
coefs = NULL,
init_sd = 2 * is.null(coefs),
check_rank = NULL
)
A list with the model used in simulation as well as the simulated hidden state sequences.
An integer > 1 defining the number of hidden states.
of class formula()
for the
state emission probabilities, or a list of such formulas in case of multiple
response variables. The left-hand side of formulas define the responses.
For multiple responses having same formula, you can use a form
c(y1, y2) ~ x
, where y1
and y2
are the response variables.
of class formula()
for the
initial state probabilities. Left-hand side of the formula should be empty.
of class formula()
for the state transition
probabilities. Left-hand side of the formula should be empty.
A data frame containing the variables used in the model
formulas. Note that this should also include also the response variable(s),
which are used to define the number of observed symbols (using levels()
)
and the length of sequences. The actual values of the response variables
does not matter though, as they are replaced by the simulated values. The
exception is the first time point in FAN-HMM case: If the emission_formula
contains lagged responses, the response variable values at the first time
point are used to define the emissions at the second time point, and the
simulations are done from the second time point onward. This matches the
case prior_obs = "fixed"
in estimate_nhmm()
. Note that compared to
estimate_*
functions, unused factor levels are not automatically dropped
from data
.
Name of the id variable in data
identifying different
sequences.
Name of the time index variable in data
.
Same as argument inits
in estimate_nhmm()
. If NULL
,
(default), the model parameters are generated randomly. If you want to
simulate new sequences based on an estimated model fit
, you can use
coefs = fit$etas
and init_sd = 0
.
Standard deviation of the normal distribution used to
generate random coefficients. Default is 2
when coefs
is NULL
and 0
otherwise.
If TRUE
, the rank of the design matrices are
checked for identifiability issues. Default is NULL
, in which case checks
are performed only if the number of sequences is 1000 or less, as the QR
decomposition quickly becomes computationally demanding. If check is not
performed, a warning is given, which can be circumvented by explicitly
using check_rank = FALSE
.