Function estimate_mnhmm
estimates a mixture version of
non-homogeneous hidden Markov model (MNHMM) where initial, transition,
emission, and mixture probabilities can depend on covariates. See
estimate_nhmm()
for further details.
estimate_mnhmm(
n_states,
n_clusters,
emission_formula,
initial_formula = ~1,
transition_formula = ~1,
cluster_formula = ~1,
data,
time,
id,
lambda = 0,
prior_obs = "fixed",
state_names = NULL,
cluster_names = NULL,
inits = "random",
init_sd = 2,
restarts = 0L,
method = "EM-DNM",
bound = Inf,
control_restart = list(),
control_mstep = list(),
check_rank = NULL,
...
)
Object of class mnhmm
.
An integer > 1 defining the number of hidden states.
A positive integer defining the number of clusters (mixtures).
of class formula()
for the
state emission probabilities, or a list of such formulas in case of multiple
response variables. The left-hand side of formulas define the responses.
For multiple responses having same formula, you can use a form
c(y1, y2) ~ x
, where y1
and y2
are the response variables.
of class formula()
for the
initial state probabilities. Left-hand side of the formula should be empty.
of class formula()
for the state transition
probabilities. Left-hand side of the formula should be empty.
of class formula()
for the mixture probabilities.
A data frame containing the variables used in the model formulas.
Name of the time index variable in data
.
Name of the id variable in data
identifying different
sequences.
Penalization factor lambda
for penalized log-likelihood, where the
penalization is 0.5 * lambda * sum(eta^2)
. Note that with
method = "L-BFGS"
both objective function (log-likelihood) and
the penalization term is scaled with number of non-missing observations.
Default is 0
, but small values such as 1e-4
can help to ensure numerical
stability of L-BFGS by avoiding extreme probabilities. See also argument
bound
for hard constraints.
Either "fixed"
or a list of vectors given the prior
distributions for the responses at time "zero". See details.
A vector of optional labels for the hidden states. If this
is NULL
(the default), numbered states are used.
A vector of optional labels for the clusters. If this
is NULL
(the default), numbered clusters are used.
If inits = "random"
(default), random initial values are
used. Otherwise inits
should be list of initial values. If coefficients
are given using list components eta_pi
, eta_A
, eta_B
,
and eta_omega
, these are used as is, alternatively initial values
can be given in terms of the initial state, transition, emission, and mixture
probabilities using list components initial_probs
, emission_probs
,
transition_probs
, and cluster_probs
. These can also be mixed, i.e. you
can give only initial_probs
and eta_A
.
Standard deviation of the normal distribution used to generate
random initial values. Default is 2
. If you want to fix the initial values
of the regression coefficients to zero, use init_sd = 0
.
Number of times to run optimization using random starting values (in addition to the final run). Default is 0.
Optimization method used. Option "EM"
uses EM
algorithm with L-BFGS in the M-step. Option "DNM"
uses
direct maximization of the log-likelihood, by default using L-BFGS. Option
"EM-DNM"
(the default) runs first a maximum of 10 iterations of EM and
then switches to L-BFGS (but other algorithms of NLopt can be used).
Positive value defining the hard lower and upper bounds for the
working parameters \(\eta\), which are used to avoid extreme probabilities and
corresponding numerical issues especially in the M-step of EM algorithm.
Default is Inf´, i.e., no bounds. Note that he bounds are not enforced for M-step in intercept-only case with
lambda = 0`.
Controls for restart steps, see details.
Controls for M-step of EM algorithm, see details.
If TRUE
, the rank of the design matrices are
checked for identifiability issues. Default is NULL
, in which case checks
are performed only if the number of sequences is 1000 or less, as the QR
decomposition quickly becomes computationally demanding. If check is not
performed, a warning is given, which can be circumvented by explicitly
using check_rank = FALSE
.
Additional arguments to nloptr::nloptr()
and EM algorithm.
See details.
estimate_nhmm()
for further details.
data("mvad", package = "TraMineR")
d <- reshape(mvad, direction = "long", varying = list(15:86),
v.names = "activity")
if (FALSE) {
set.seed(1)
fit <- estimate_mnhmm(n_states = 3, n_clusters = 2,
data = d, time = "time", id = "id",
cluster_formula = ~ male + catholic + gcse5eq + Grammar +
funemp + fmpr + livboth + Belfast +
N.Eastern + Southern + S.Eastern + Western,
emission_formula = activity ~ male + catholic + gcse5eq,
initial_formula = ~ 1,
transition_formula = ~ male + gcse5eq
)
}
Run the code above in your browser using DataLab