
Function fit_model
estimates the parameters of mixture hidden
Markov models and its restricted variants using maximimum likelihood.
Initial values for estimation are taken from the corresponding components
of the model with preservation of original zero probabilities.
fit_model(
model,
em_step = TRUE,
global_step = FALSE,
local_step = FALSE,
control_em = list(),
control_global = list(),
control_local = list(),
lb,
ub,
threads = 1,
log_space = FALSE,
constraints = NULL,
fixed_inits = NULL,
fixed_emissions = NULL,
fixed_transitions = NULL,
...
)
Log-likelihood of the estimated model.
Results after the EM step: log-likelihood (logLik
), number of iterations
(iterations
), relative change in log-likelihoods between the last two iterations (change
), and
the log-likelihoods of the n_optimum
best models after the EM step (best_opt_restart
).
Results after the global step.
Results after the local step.
The matched function call.
An object of class hmm
or mhmm
.
Logical. Whether or not to use the EM algorithm at the start
of the parameter estimation. The default is TRUE
.
Logical. Whether or not to use global optimization via
nloptr
(possibly after the EM step). The default is FALSE
.
Logical. Whether or not to use local optimization via
nloptr
(possibly after the EM and/or global steps). The default is FALSE
.
Optional list of control parameters for the EM algorithm. Possible arguments are
The maximum number of iterations, the default is 1000.
Note that iteration counter starts with -1 so with maxeval=1
you get already two iterations.
This is for backward compatibility reasons.
The level of printing. Possible values are 0 (prints nothing), 1 (prints information at the start and the end of the algorithm), 2 (prints at every iteration), and for mixture models 3 (print also during optimization of coefficients).
Relative tolerance for convergence defined as
A list containing options for possible EM restarts with the following components:
Number of restarts of the EM algorithm using random initial values. The default is 0, i.e. no restarts.
Logical. Should the original transition probabilities be varied? The default is TRUE
.
Logical. Should the original emission probabilities be varied? The default is TRUE
.
Standard deviation for rnorm
used in randomization. The default is 0.25.
Maximum number of iterations, the default is control_em$maxeval
Level of printing in restarted EM steps. The default is control_em$print_level
.
Relative tolerance for convergence at restarted EM steps. The default is control_em$reltol
.
If the relative change of the final model of the restart phase is larger than the tolerance
for the original EM phase, the final model is re-estimated with the original reltol
and maxeval
at the end of the EM step.
Save the log-likelihood values of the n_optimum
best
models (from all estimated models including the the first EM run.).
The default is min(times + 1, 25)
.
If TRUE
. Use the initial values of the input model as starting
points for the permutations. Otherwise permute the results of the first EM run.
Optional list of additional arguments for
nloptr
argument opts
. The default values are
"NLOPT_GD_MLSL_LDS"
list(algorithm = "NLOPT_LD_LBFGS", ftol_rel = 1e-6, xtol_rel = 1e-4)
10000
(maximum number of iterations in global optimization algorithm.)
60
(maximum time for global optimization. Set to 0 for unlimited time.)
Optional list of additional arguments for
nloptr
argument opts
. The default values are
"NLOPT_LD_LBFGS"
1e-10
1e-8
10000
(maximum number of iterations)
Lower and upper bounds for parameters in Softmax parameterization.
The default interval is
Number of threads to use in parallel computing. The default is 1.
Make computations using log-space instead of scaling for greater
numerical stability at a cost of decreased computational performance. The default is FALSE
.
Integer vector defining equality constraints for emission distributions. Not supported for EM algorithm. See details.
Can be used to fix some of the probabilities to their initial values.
Should have same structure as model$initial_probs
, where each element
is either TRUE (fixed) or FALSE (to be estimated). Note that zero probabilities are always fixed to 0.
Not supported for EM algorithm. See details.
Can be used to fix some of the probabilities to their initial values.
Should have same structure as model$emission_probs
, where each element
is either TRUE (fixed) or FALSE (to be estimated). Note that zero probabilities are always fixed to 0.
Not supported for EM algorithm. See details.
Can be used to fix some of the probabilities to their initial values.
Should have same structure as model$transition_probs
, where each element
is either TRUE (fixed) or FALSE (to be estimated). Note that zero probabilities are always fixed to 0.
Not supported for EM algorithm. See details.
Additional arguments to nloptr
.
The fitting function provides three estimation steps: 1) EM algorithm, 2) global optimization, and 3) local optimization. The user can call for one method or any combination of these steps, but should note that they are preformed in the above-mentioned order. The results from a former step are used as starting values in a latter, except for some of global optimization algorithms (such as MLSL and StoGO) which only use initial values for setting up the boundaries for the optimization.
It is possible to rerun the EM algorithm automatically using random starting
values based on the first run of EM. Number of restarts is defined by
the restart
argument in control_em
. As the EM algorithm is
relatively fast, this method might be preferred option compared to the proper
global optimization strategy of step 2.
The default global optimization method (triggered via global_step = TRUE
) is
the multilevel single-linkage method (MLSL) with the LDS modification (NLOPT_GD_MLSL_LDS
as
algorithm
in control_global
), with L-BFGS as the local optimizer.
The MLSL method draws random starting points and performs a local optimization
from each. The LDS modification uses low-discrepancy sequences instead of
pseudo-random numbers as starting points and should improve the convergence rate.
In order to reduce the computation time spent on non-global optima, the
convergence tolerance of the local optimizer is set relatively large. At step 3,
a local optimization (L-BFGS by default) is run with a lower tolerance to find the
optimum with high precision.
There are some theoretical guarantees that the MLSL method used as the default optimizer in step 2 shoud find all local optima in a finite number of local optimizations. Of course, it might not always succeed in a reasonable time. The EM algorithm can help in finding good boundaries for the search, especially with good starting values, but in some cases it can mislead. A good strategy is to try a couple of different fitting options with different combinations of the methods: e.g. all steps, only global and local steps, and a few evaluations of EM followed by global and local optimization.
By default, the estimation time is limited to 60 seconds in global optimization step, so it is advisable to change the default settings for the proper global optimization.
Any algorithm available in the nloptr
function can be used for the global and
local steps.
Equality constraints for emission distributions can be defined using the argument
constraints
. This should be a vector with length equal to the number of states,
with numbers starting from 1 and increasing for each unique row of the emission probability matrix.
For example in case of five states with emissions of first and third states being equal,
constraints = c(1, 2, 1, 3, 4)
. Similarly, some of the model parameters can be fixed to their
initial values by using arguments fixed_inits
, fixed_emissions
,
and fixed_transitions
, where the structure of the arguments should be
same as the corresponding model components, so that TRUE value means that
the parameter should be fixed and FALSE otherwise (it is still treated as fixed if it
is zero though). For both types of constrains, only numerical optimisation
(local or global) is available, and currently the gradients are computed numerically
(if needed) in these cases.
In a case where the is no transitions from one state to anywhere (even to itself), the state is defined as absorbing in a way that probability of staying in this state is fixed to 1. See also `build_mm` function.
Helske S. and Helske J. (2019). Mixture Hidden Markov Models for Sequence Data: The seqHMM Package in R, Journal of Statistical Software, 88(3), 1-32. doi:10.18637/jss.v088.i03
build_hmm
, build_mhmm
,
build_mm
, build_mmm
, and build_lcm
for constructing different types of models; summary.mhmm
for a summary of a MHMM; separate_mhmm
for reorganizing a MHMM into
a list of separate hidden Markov models; plot.hmm
and plot.mhmm
for plotting model objects; and ssplot
and mssplot
for plotting
stacked sequence plots of hmm
and mhmm
objects.
# Hidden Markov model for mvad data
data("mvad", package = "TraMineR")
mvad_alphabet <-
c("employment", "FE", "HE", "joblessness", "school", "training")
mvad_labels <- c(
"employment", "further education", "higher education",
"joblessness", "school", "training"
)
mvad_scodes <- c("EM", "FE", "HE", "JL", "SC", "TR")
mvad_seq <- seqdef(mvad, 17:86,
alphabet = mvad_alphabet,
states = mvad_scodes, labels = mvad_labels, xtstep = 6
)
attr(mvad_seq, "cpal") <- colorpalette[[6]]
# Starting values for the emission matrix
emiss <- matrix(
c(
0.05, 0.05, 0.05, 0.05, 0.75, 0.05, # SC
0.05, 0.75, 0.05, 0.05, 0.05, 0.05, # FE
0.05, 0.05, 0.05, 0.4, 0.05, 0.4, # JL, TR
0.05, 0.05, 0.75, 0.05, 0.05, 0.05, # HE
0.75, 0.05, 0.05, 0.05, 0.05, 0.05
), # EM
nrow = 5, ncol = 6, byrow = TRUE
)
# Starting values for the transition matrix
trans <- matrix(0.025, 5, 5)
diag(trans) <- 0.9
# Starting values for initial state probabilities
initial_probs <- c(0.2, 0.2, 0.2, 0.2, 0.2)
# Building a hidden Markov model
init_hmm_mvad <- build_hmm(
observations = mvad_seq,
transition_probs = trans, emission_probs = emiss,
initial_probs = initial_probs
)
if (FALSE) {
set.seed(21)
fit_hmm_mvad <- fit_model(init_hmm_mvad, control_em = list(restart = list(times = 50)))
hmm_mvad <- fit_hmm_mvad$model
}
# save time, load the previously estimated model
data("hmm_mvad")
# Markov model
# Note: build_mm estimates model parameters from observations,
# no need for estimating with fit_model unless there are missing observations
mm_mvad <- build_mm(observations = mvad_seq)
# Comparing likelihoods, MM fits better
logLik(hmm_mvad)
logLik(mm_mvad)
if (FALSE) {
require("igraph") # for layout_in_circle
plot(mm_mvad,
layout = layout_in_circle, legend.prop = 0.3,
edge.curved = 0.3, edge.label = NA,
vertex.label.pos = c(0, 0, pi, pi, pi, 0)
)
##############################################################
#' # Three-state three-channel hidden Markov model
# See ?hmm_biofam for five-state version
data("biofam3c")
# Building sequence objects
marr_seq <- seqdef(biofam3c$married,
start = 15,
alphabet = c("single", "married", "divorced")
)
child_seq <- seqdef(biofam3c$children,
start = 15,
alphabet = c("childless", "children")
)
left_seq <- seqdef(biofam3c$left,
start = 15,
alphabet = c("with parents", "left home")
)
# Define colors
attr(marr_seq, "cpal") <- c("violetred2", "darkgoldenrod2", "darkmagenta")
attr(child_seq, "cpal") <- c("darkseagreen1", "coral3")
attr(left_seq, "cpal") <- c("lightblue", "red3")
# Starting values for emission matrices
emiss_marr <- matrix(NA, nrow = 3, ncol = 3)
emiss_marr[1, ] <- seqstatf(marr_seq[, 1:5])[, 2] + 1
emiss_marr[2, ] <- seqstatf(marr_seq[, 6:10])[, 2] + 1
emiss_marr[3, ] <- seqstatf(marr_seq[, 11:16])[, 2] + 1
emiss_marr <- emiss_marr / rowSums(emiss_marr)
emiss_child <- matrix(NA, nrow = 3, ncol = 2)
emiss_child[1, ] <- seqstatf(child_seq[, 1:5])[, 2] + 1
emiss_child[2, ] <- seqstatf(child_seq[, 6:10])[, 2] + 1
emiss_child[3, ] <- seqstatf(child_seq[, 11:16])[, 2] + 1
emiss_child <- emiss_child / rowSums(emiss_child)
emiss_left <- matrix(NA, nrow = 3, ncol = 2)
emiss_left[1, ] <- seqstatf(left_seq[, 1:5])[, 2] + 1
emiss_left[2, ] <- seqstatf(left_seq[, 6:10])[, 2] + 1
emiss_left[3, ] <- seqstatf(left_seq[, 11:16])[, 2] + 1
emiss_left <- emiss_left / rowSums(emiss_left)
# Starting values for transition matrix
trans <- matrix(c(
0.9, 0.07, 0.03,
0, 0.9, 0.1,
0, 0, 1
), nrow = 3, ncol = 3, byrow = TRUE)
# Starting values for initial state probabilities
inits <- c(0.9, 0.09, 0.01)
# Building hidden Markov model with initial parameter values
init_hmm_bf <- build_hmm(
observations = list(marr_seq, child_seq, left_seq),
transition_probs = trans,
emission_probs = list(emiss_marr, emiss_child, emiss_left),
initial_probs = inits
)
# Fitting the model with different optimization schemes
# Only EM with default values
hmm_1 <- fit_model(init_hmm_bf)
hmm_1$logLik # -24179.1
# Only L-BFGS
hmm_2 <- fit_model(init_hmm_bf, em_step = FALSE, local_step = TRUE)
hmm_2$logLik # -22267.75
# Global optimization via MLSL_LDS with L-BFGS as local optimizer and final polisher
# This can be slow, use parallel computing by adjusting threads argument
# (here threads = 1 for portability issues)
hmm_3 <- fit_model(
init_hmm_bf,
em_step = FALSE, global_step = TRUE, local_step = TRUE,
control_global = list(maxeval = 5000, maxtime = 0), threads = 1
)
hmm_3$logLik # -21675.42
# EM with restarts, much faster than MLSL
set.seed(123)
hmm_4 <- fit_model(init_hmm_bf, control_em = list(restart = list(times = 5)))
hmm_4$logLik # -21675.4
# Global optimization via StoGO with L-BFGS as final polisher
# This can be slow, use parallel computing by adjusting threads argument
# (here threads = 1 for portability issues)
set.seed(123)
hmm_5 <- fit_model(
init_hmm_bf,
em_step = FALSE, global_step = TRUE, local_step = TRUE,
lb = -50, ub = 50, control_global = list(
algorithm = "NLOPT_GD_STOGO",
maxeval = 2500, maxtime = 0
), threads = 1
)
hmm_5$logLik # -21675.4
##############################################################
# Mixture HMM
data("biofam3c")
## Building sequence objects
marr_seq <- seqdef(biofam3c$married,
start = 15,
alphabet = c("single", "married", "divorced")
)
child_seq <- seqdef(biofam3c$children,
start = 15,
alphabet = c("childless", "children")
)
left_seq <- seqdef(biofam3c$left,
start = 15,
alphabet = c("with parents", "left home")
)
## Choosing colors
attr(marr_seq, "cpal") <- c("#AB82FF", "#E6AB02", "#E7298A")
attr(child_seq, "cpal") <- c("#66C2A5", "#FC8D62")
attr(left_seq, "cpal") <- c("#A6CEE3", "#E31A1C")
## Starting values for emission probabilities
# Cluster 1
B1_marr <- matrix(
c(
0.8, 0.1, 0.1, # High probability for single
0.8, 0.1, 0.1,
0.3, 0.6, 0.1, # High probability for married
0.3, 0.3, 0.4
), # High probability for divorced
nrow = 4, ncol = 3, byrow = TRUE
)
B1_child <- matrix(
c(
0.9, 0.1, # High probability for childless
0.9, 0.1,
0.9, 0.1,
0.9, 0.1
),
nrow = 4, ncol = 2, byrow = TRUE
)
B1_left <- matrix(
c(
0.9, 0.1, # High probability for living with parents
0.1, 0.9, # High probability for having left home
0.1, 0.9,
0.1, 0.9
),
nrow = 4, ncol = 2, byrow = TRUE
)
# Cluster 2
B2_marr <- matrix(
c(
0.8, 0.1, 0.1, # High probability for single
0.8, 0.1, 0.1,
0.1, 0.8, 0.1, # High probability for married
0.7, 0.2, 0.1
),
nrow = 4, ncol = 3, byrow = TRUE
)
B2_child <- matrix(
c(
0.9, 0.1, # High probability for childless
0.9, 0.1,
0.9, 0.1,
0.1, 0.9
),
nrow = 4, ncol = 2, byrow = TRUE
)
B2_left <- matrix(
c(
0.9, 0.1, # High probability for living with parents
0.1, 0.9,
0.1, 0.9,
0.1, 0.9
),
nrow = 4, ncol = 2, byrow = TRUE
)
# Cluster 3
B3_marr <- matrix(
c(
0.8, 0.1, 0.1, # High probability for single
0.8, 0.1, 0.1,
0.8, 0.1, 0.1,
0.1, 0.8, 0.1, # High probability for married
0.3, 0.4, 0.3,
0.1, 0.1, 0.8
), # High probability for divorced
nrow = 6, ncol = 3, byrow = TRUE
)
B3_child <- matrix(
c(
0.9, 0.1, # High probability for childless
0.9, 0.1,
0.5, 0.5,
0.5, 0.5,
0.5, 0.5,
0.1, 0.9
),
nrow = 6, ncol = 2, byrow = TRUE
)
B3_left <- matrix(
c(
0.9, 0.1, # High probability for living with parents
0.1, 0.9,
0.5, 0.5,
0.5, 0.5,
0.1, 0.9,
0.1, 0.9
),
nrow = 6, ncol = 2, byrow = TRUE
)
# Starting values for transition matrices
A1 <- matrix(
c(
0.80, 0.16, 0.03, 0.01,
0, 0.90, 0.07, 0.03,
0, 0, 0.90, 0.10,
0, 0, 0, 1
),
nrow = 4, ncol = 4, byrow = TRUE
)
A2 <- matrix(
c(
0.80, 0.10, 0.05, 0.03, 0.01, 0.01,
0, 0.70, 0.10, 0.10, 0.05, 0.05,
0, 0, 0.85, 0.01, 0.10, 0.04,
0, 0, 0, 0.90, 0.05, 0.05,
0, 0, 0, 0, 0.90, 0.10,
0, 0, 0, 0, 0, 1
),
nrow = 6, ncol = 6, byrow = TRUE
)
# Starting values for initial state probabilities
initial_probs1 <- c(0.9, 0.07, 0.02, 0.01)
initial_probs2 <- c(0.9, 0.04, 0.03, 0.01, 0.01, 0.01)
# Birth cohort
biofam3c$covariates$cohort <- cut(biofam3c$covariates$birthyr, c(1908, 1935, 1945, 1957))
biofam3c$covariates$cohort <- factor(
biofam3c$covariates$cohort,
labels = c("1909-1935", "1936-1945", "1946-1957")
)
# Build mixture HMM
init_mhmm_bf <- build_mhmm(
observations = list(marr_seq, child_seq, left_seq),
initial_probs = list(initial_probs1, initial_probs1, initial_probs2),
transition_probs = list(A1, A1, A2),
emission_probs = list(
list(B1_marr, B1_child, B1_left),
list(B2_marr, B2_child, B2_left),
list(B3_marr, B3_child, B3_left)
),
formula = ~ sex + cohort, data = biofam3c$covariates,
channel_names = c("Marriage", "Parenthood", "Residence")
)
# Fitting the model with different settings
# Only EM with default values
mhmm_1 <- fit_model(init_mhmm_bf)
mhmm_1$logLik # -12713.08
# Only L-BFGS
mhmm_2 <- fit_model(init_mhmm_bf, em_step = FALSE, local_step = TRUE)
mhmm_2$logLik # -12966.51
# Use EM with multiple restarts
set.seed(123)
mhmm_3 <- fit_model(init_mhmm_bf, control_em = list(restart = list(times = 5, transition = FALSE)))
mhmm_3$logLik # -12713.08
}
# Left-to-right HMM with equality constraint:
set.seed(1)
# Transition matrix
# Either stay or move to next state
A <- diag(c(0.9, 0.95, 0.95, 1))
A[1, 2] <- 0.1
A[2, 3] <- 0.05
A[3, 4] <- 0.05
# Emission matrix, rows 1 and 3 equal
B <- rbind(
c(0.4, 0.2, 0.3, 0.1),
c(0.1, 0.5, 0.1, 0.3),
c(0.4, 0.2, 0.3, 0.1),
c(0, 0.2, 0.4, 0.4)
)
# Start from first state
init <- c(1, 0, 0, 0)
# Simulate sequences
sim <- simulate_hmm(
n_sequences = 100,
sequence_length = 20, init, A, B
)
# initial model, use true values as inits for faster estimation here
model <- build_hmm(sim$observations, init = init, trans = A, emiss = B)
# estimate the model subject to constraints:
# First and third row of emission matrix are equal (see details)
fit <- fit_model(model,
constraints = c(1, 2, 1, 3),
em_step = FALSE, local_step = TRUE
)
fit$model
## Fix some emissions:
fixB <- matrix(FALSE, 4, 4)
fixB[2, 1] <- fixB[1, 3] <- TRUE # these are fixed to their initial values
fit <- fit_model(model,
fixed_emissions = fixB,
em_step = FALSE, local_step = TRUE
)
fit$model$emission_probs
Run the code above in your browser using DataLab