findmleHMMnostarting: Multiple Initialization Maximum Likelihood Estimation for Hidden Markov Models

Description

Fits a Hidden Markov Model (HMM) by repeatedly initializing observation and transition parameters and selecting the fit with the highest log-likelihood. This approach helps avoid convergence to poor local optima. For the generalized extreme value (GEV) distribution, starting values are generated from repeated maximum likelihood fits on random data subsets.

Usage

findmleHMMnostarting(J, x, obsdist, no.initials = 50, EM = FALSE,
                     verbose = TRUE, seed = NULL, ...)

Value

A list corresponding to the best fit across all initializations, containing:

estimate: List of estimated HMM parameters, including state-dependent observation parameters and transition probabilities.
loglik: The maximized log-likelihood value.
AIC: The Akaike Information Criterion for the fitted model.
BIC: Bayesian Information Criteria for the fitted model.
hessian: Optional. The Hessian matrix at the maximum likelihood estimates (returned if EM = FALSE).

Arguments

J: Integer. The number of hidden states in the HMM. Must be strictly greater than 1.
x: Numeric vector. The observed data sequence.
obsdist: Character string. The observation distribution. Supported distributions are: "norm", "pois", "weibull", "zip", "nbinom", "zinb", "exp", "gamma", "lnorm", "gev", "ZInormal", "ZIgamma".
no.initials: Integer. The number of random initializations to attempt. Defaults to 50.
EM: Logical. If TRUE, uses an EM-based semi-Markov approximation for estimation. If FALSE, maximizes the likelihood directly using nlm. Defaults to FALSE.
verbose: Logical. If TRUE, progress messages are printed to the console. Default is TRUE.
seed: Integer or NULL. Random seed for reproducibility. Default is NULL.
...: Further arguments to be passed to findmleHMM in the case of EM=TRUE.

Author

Aimee Cody

Details

This function automates multiple trials of findmleHMM with randomized starting values, returning the fit that achieves the highest log-likelihood.

For most observation distributions, starting values are generated via clusterHMM.
For the GEV distribution, starting values are drawn from repeated fits of evd::fgev on random data segments. Up to 20,000 attempts are made, and a warning is issued if fewer than 1000 valid estimates are obtained.

During each iteration:

Observation parameters are perturbed slightly to encourage exploration.
A transition matrix Pi is drawn from a random uniform distribution with added self-transition bias.
The HMM is estimated via findmleHMM.
If the resulting log-likelihood exceeds the current best, the model is updated.

At the end of all iterations, the best-fitting model is returned. When verbose = TRUE, iteration numbers and error messages are displayed during the fitting process.

Examples

Run this code

set.seed(123)
J <- 3
Pi <- matrix(c(0.7, 0.2, 0.1,
               0.1, 0.8, 0.1,
               0.2, 0.3, 0.5), nrow = 3, byrow = TRUE)
obspar <- list(mean = c(-2, 0, 3),
               sd   = c(0.5, 1, 1.5))
x <- generateHMM(n = 200, J = J, Pi = Pi, obsdist = "norm", obspar = obspar)$x

# \donttest{
fit <- findmleHMMnostarting(J = J, x = x, obsdist = "norm",
                            no.initials = 30)

fit$loglik
fit$estimate

fit_silent <- findmleHMMnostarting(J = J, x = x, obsdist = "norm",
                                   no.initials = 30, verbose = FALSE)
# }

Run the code above in your browser using DataLab