hmm: Fit a hidden Markov model to discrete data.

Description

Uses the EM algorithm to perform a maximum likelihood fit of a hidden Markov model to discrete data where the observations come from one of a number of finite discrete distributions, depending on the (hidden) state of the Markov chain. These distributions are specified (non-parametrically) by a matrix $R = [\rho_{ij}]$ where $\rho_{ij} = P(Y = y_i | S = j)$, $Y$ being the observable random variable and $S$ being the hidden state.

Usage

hmm(y, yval=NULL, par0=NULL, K=NULL, rand.start=NULL, stationary=cis,
    mixture=FALSE, cis=TRUE, tolerance=1e-4, verbose=FALSE,
    itmax=200, crit='PCLL',keep.y=TRUE, data.name=NULL)

Arguments

A vector of discrete data or a list of such vectors; missing values are allowed.

yval

A vector of possible values for the data; it defaults to the sorted unique values of y. If any value of y does not match some value of yval, it will be treated as a MISSING VALUE.

par0

An optional (named) list of starting values for the parameters of the model, with components tpm (transition probability matrix) and Rho. The matrix Rho specifies the probability that the observations take on each value in yval, given the state of the hidden Markov chain. The columns of Rho correspond to states, the rows to the values of yval.

If par0$tpm has row and column names these must be the same (otherwise an error is thrown). If par0$Rho has column names these must be the same as the row and column names of par0$tpm.

If par0 is not specified, starting values are created by the function init.all().

The number of states in the hidden Markov chain; if par0 is not specified K MUST be; if par0 is specified, K is ignored.

Note that K=1 is acceptable; if K is 1 then all observations are treated as being independent and the non-parametric estimate of the distribution of the observations is calculated in the obvious way.

rand.start

A list consisting of two logical scalars which must be named tpm and Rho, if tpm is TRUE then the function init.all() chooses entries for then starting value of tpm at random; likewise for Rho. This argument defaults to list(tpm=FALSE,Rho=FALSE).

stationary

Logical scalar. If TRUE then the model is fitted under the stationarity assumption, i.e. that the Markov chain was in steady state at the time that observations commenced. In this case the initial state probability distribution is estimated as the stationary distribution determined by the (estimated) transition probability matrix. Otherwise if cis (see below) is TRUE the initial state probability distribution is estimated as the mean of the vectors of conditional probabilities of the states, given the observation sequences, at time t=1. If stationary is TRUE and cis is FALSE an error is given.

mixture

A logical scalar; if TRUE then a mixture model (all rows of the transition probability matrix are identical) is fitted rather than a general hidden Markov model.

cis

A logical scalar specifying whether there should be a constant initial state probability distribution. If stationary is FALSE and cis is FALSE then the initial state probability distribution for a given observation sequence is equal to 1 where the (first) maximum of the vector of conditional probabilities of the states, given the observation sequences, at time t=1, occurs, and is 0 elsewhere. If stationary is TRUE and cis is FALSE an error is given.

tolerance

If the value of the quantity used for the stopping criterion is less than tolerance then the EM algorithm is considered to have converged.

verbose

A logical scalar determining whether to print out details of the progress of the EM algorithm.

itmax

If the convergence criterion has not been met by the time itmax EM steps have been performed, a warning message is printed out, and the function stops. A value is returned by the function anyway, with the logical component "converged" set to FALSE.

crit

The name of the stopping criterion, which must be one of "PCLL" (percent change in log-likelihood; the default), "L2" (L-2 norm, i.e. square root of sum of squares of change in coefficients), or "Linf" (L-infinity norm, i.e. maximum absolute value of change in coefficients).

keep.y

Logical scalar; should the observations y be returned as a component of the value of this function?

data.name

An identifying tag for the fit; if omitted, it defaults to the name of data set y as determined by deparse(substitute(y)).

Value

A list with components:

Rho

The fitted value of the probability matrix Rho specifying the distributions of the observations (the “emission” probabilities).

tpm

The fitted value of the transition probabilty matrix tpm.

ispd

The fitted initial state probability distribution, or a matrix of (trivial or “deterministic”) initial state probability distributions, one (column) of ispd for each observation sequence.

If stationary is TRUE then ispd is assumed to be the (unique) stationary distribution for the chain, and thereby determined by the transition probability matrix tpm. If stationary is FALSE and cis is TRUE then ispd is estimated as the mean of the vectors of conditional probabilities of the states, given the observation sequences, at time t=1.

If cis is FALSE then ispd is a matrix whose columns are trivial probability vectors, as described above.

log.like

The final value of the log likelihood, as calculated through recursion.

converged

A logical scalar saying whether the algorithm satisfied the convergence criterion before the maximum of itmax EM steps was exceeded.

nstep

The number of EM steps performed by the algorithm.

The observations (argument y). Present only if keep.y is TRUE.

data.name

An identifying tag, specified as an argument, or determined from the name of the argument y by deparse(substitute(y)).

stationary

The argument stationary.

cis

The argument cis.

Notes

The package used to require the argument y to be a matrix in the case of multiple observed series. If the series were of unequal length the user was expected to pad them out with NAs to equalize the lengths. In the revision from version 0.0-9 to 0.1-0 this was changed, requiring the argument y to be (more sensibly) a list when there are multiple series. (Unfortunately this help page was not correspondingly revised at the time of that transition nor of the next one, and the old ``matrix'' format was left as being the specified input format until the package was updated to version 0.1-2.)

The old matrix format is still permitted (and the matrix is internally changed into a list) but this is deprecated. In some future version of hmm.discnp this possibility will be removed.

If K=1 then tpm, ispd, converged, and nstep are all set equal to NA in the list returned by this function.

The estimate of ispd in the non-stationary setting is inevitably very poor, unless the number of sequences of observations (the length of the list y) is very large. We have in effect ``less than one'' relevant observation for each such sequence.

The returned values of tpm and Rho have row and column names. These are the same as the row and column names of the starting values of these matrices (as provided in par0 is these exist. Otherwise they are taken to be the appropriate sequences of integers. Likewise the returned value of ispd is a named vector, the names being the same as the row (and column) names of tpm and the column names of Rho.

Warnings

The ordering of the (hidden) states can be arbitrary. What the estimation procedure decides to call ``state 1'' may not be what you think of as being state number 1. The ordering of the states will be affected by the starting values used.

At some time in the future the (deprecated) option of being able to specify argument y as a matrix (in the setting in which there are multiple data series to which a single model is being fitted) will no longer be permitted.

Details

The hard work is done by a Fortran subroutine "recurse" (actually coded in Ratfor) which is dynamically loaded.

References

Rabiner, L. R., "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE vol. 77, pp. 257 -- 286, 1989.

Zucchini, W. and Guttorp, P., "A hidden Markov model for space-time precipitation," Water Resources Research vol. 27, pp. 1917-1923, 1991.

MacDonald, I. L., and Zucchini, W., "Hidden Markov and Other Models for Discrete-valued Time Series", Chapman & Hall, London, 1997.

Liu, Limin, "Hidden Markov Models for Precipitation in a Region of Atlantic Canada", Master's Report, University of New Brunswick, 1997.

Examples

Run this code

# NOT RUN {
# 1.
Yval <- LETTERS[1:10]
Tpm  <- matrix(c(0.75,0.25,0.25,0.75),ncol=2,byrow=TRUE)
Rho  <- cbind(c(rep(1,5),rep(0,5)),c(rep(0,5),rep(1,5)))/5
rownames(Rho) <- Yval
set.seed(42)
xxx  <- sim.hmm(rep(1000,5),tpm=Tpm,Rho=Rho,yval=Yval)
fit  <- hmm(xxx,par0=list(tpm=Tpm,Rho=Rho))
print(fit$Rho)

# 2.
# See the help for sim.hmm() for how to generate y.num.
# }
# NOT RUN {
   fit.num     <- hmm(y.num,K=2,verb=TRUE)
   fit.num.mix <- hmm(y.num,K=2,verb=TRUE,mixture=TRUE)
   print(fit.num[c("tpm","Rho")])
# }
# NOT RUN {
# Note that states 1 and 2 get swapped.

# 3.
xxx <- with(colifCount,split(y,f=locn))
fitColCount <- hmm(xxx,K=2) # Two states: above and below the thermocline.

# 4.
fitLesCount <- hmm(lesionCount,K=2) # Two states: relapse and remission.
# }

Run the code above in your browser using DataLab