hmm: Fit a hidden Markov model to discrete data.

Description

Uses the EM algorithm to perform a maximum likelihood fit of a hidden Markov model to discrete data where the observations come from one of a number of finite discrete distributions, depending on the (hidden) state of the Markov chain. These distributions are specified (non-parametrically) by a matrix $R = [\rho_{ij}]$ where $\rho_{ij} = P(Y = y_i | S = j)$, $Y$ being the observable random variable and $S$ being the hidden state.

Usage

hmm(y, yval=NULL, par0=NULL, K=NULL, rand.start=NULL, mixture=FALSE,
    tolerance=1e-4, verbose=FALSE, itmax=200, crit='PCLL', data.name=NULL)

Arguments

A vector or matrix of discrete data; missing values are allowed. If y is a matrix, each column is interpreted as an independent replicate of the observation sequence.

yval

A vector of possible values for the data; it defaults to the sorted unique values of y. If any value of y does not match some value of yval, it will be treated as a MISSING VALUE.

par0

An optional list of starting values for the parameters of the model, with components tpm (transition probability matrix) and Rho. The matrix Rho specifies the probability that the observations take on each value in

The number of states in the hidden Markov chain; if par0 is not specified K MUST be; if par0 is specified, K is ignored.

rand.start

A list consisting of two logical scalars which must be named tmp and Rho, if tmp is TRUE then the function init.all() chooses entries for then starting value of tmp at random; likewise for Rho

mixture

A logical scalar; if TRUE then a mixture model (all rows of the transition probability matrix are identical) is fitted rather than a general hidden Markov model.

tolerance

If the value of the quantity used for the stopping criterion is less than tolerance then the EM algorithm is considered to have converged.

verbose

A logical scalar determining whether to print out details of the progress of the EM algorithm.

itmax

If the convergence criterion has not been met by the time itmax EM steps have been performed, a warning message is printed out, and the function stops. A value is returned by the function anyway, with the logical component "converged" set to

crit

The name of the stopping criterion, which must be one of "PCLL" (percent change in log-likelihood; the default), "L2" (L-2 norm, i.e. square root of sum of squares of change in coefficients), or "Linf" (L-infinity norm, i.e. maximum absolute value of ch

data.name

An identifying tag for the fit; if omitted, it defaults to the name of data set y as determined by deparse(substitute(y)).

Value

A list with components:
RhoThe fitted value of the probability matrix Rho specifying the distributions of the observations.
tpmThe fitted value of the transition probabilty matrix tpm.
ispdThe fitted initial state probability distribution, assumed to be the (unique) stationary distribution for the chain, and thereby determined by the transition probability matrix tpm.
log.likeThe final value of the log likelihood, as calculated through recursion.
convergedA logical scalar saying whether the algorithm satisfied the convergence criterion before the maximum of itmax EM steps was exceeded.
nstepThe number of EM steps performed by the algorithm.
data.nameAn identifying tag, specified as an argument, or determined from the name of the argument y by deparse(substitute(y)).

Details

The hard work is done by a Fortran subroutine "recurse" (actually coded in Ratfor) which is dynamically loaded.

References

Rabiner, L. R., "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE vol. 77, pp. 257 -- 286, 1989.

Liu, Limin, "Hidden Markov Models for Precipitation in a Region of Atlantic Canada", Master's Report, University of New Brunswick, 1997.

Examples

Run this code

# See the help for sim.hmm() for how to generate y.sim.
try <- hmm(y.sim,K=2,verb=T)

Run the code above in your browser using DataLab