Learn R Programming

communication (version 0.1)

hmm: Train a hidden Markov model with multivariate normal state distributions.

Description

Train a hidden Markov model with multivariate normal state distributions.

Usage

hmm(
  Xs,
  weights = NULL,
  nstates,
  par = list(),
  control = list(),
  labels = list()
)

Arguments

Xs

List of nsequences matrices; each matrix represents one observation sequence and is of dimension nobs x nfeatures. For a single observation sequence, a single matrix can be provided

weights

Optional vector of weights, one for each observation sequence

nstates

Integer; number of states

par

List of initialization parameters; see 'Details'

control

List of control parameters for EM steps

labels

List of observation labels for supervised training, with each element corresponding to an observation sequence. Element i can either be an vector of integer state labels in 1:nstates or a matrix of dimension nstates x nrow(Xs[[i]]) with columns summing to 1. If labels are supplied, E-step is suppressed.

Value

An object of class hmm. Contains fitted values of model parameters, along with input values for hyperparameters and features.

Details

The par argument is a list of initialization parameters. Can supply any of the following components:

  • method Name of method used to automatically initialize EM run. Currently only 'dirichlet' and 'random-spherical' are implemented. If provided, user-specified state distributions are ignored. 'dirichlet' randomly generates responsibilities which are in turn used to calculate starting distributions. 'random-spherical' randomly draws nstates observations and uses their features as state means; all state covariance matrices are set to a diagonal matrix with entries method_arg (default=1).

  • method_arg Argument to supply to method. For method='dirichlet', this is a scalar concentration alpha (same value used for all states). For method='random-spherical', this is a scalar for diagonal entries of the spherical covariance matrices of the starting distributions (after features are standardized). 'dirichlet' is implemented. If provided, all other arguments are ignored.

  • resp Matrix or list of nsequences matrices with rows summing to 1; each matrix represents one observation sequence and is of dimension nobs x nstates, with the (t,k)-th entry giving the initial probability that the t-th observation belongs to state k. If either resp or both mus and Sigmas are not provided, responsibilities are randomly initialized using rdirichlet with all shape parameters set to 10.

  • mus List of nstates vectors with length nfeatures, each corresponding to the mean of a state distribution

  • Sigmas List of nstates matrices with dimension nfeatures x nfeatures, each corresponding to the covariance matrix of a state distribution

  • Gamma Matrix of transition probabilities with dimension nstates x nstates, with row k representing the probabilities of each transition out of k and summing to 1. If not supplied, each row is randomly drawn from rdirichlet with all shape parameters set to 10.

  • delta Vector of initial state probabilities, of length nstates and summing to 1. If not supplied, delta is set to the stationary distribution of Gamma, i.e. the normalized first left eigenvector.

The control argument is a list of EM control parameters that can supply any of the following components

  • lambda Ridge-like regularization parameter. lambda is added to each diag(Sigmas[[k]]) to stabilize each state's covariance matrix, which might otherwise be numerically singular, before inverting to calculate multivariate normal densities. Note that regularization is applied after all features are standardized, so diag(Sigmas[[k]]) is unlikely to contain elements greater than 1. This parameter should be selected through cross-validation.

  • tol EM terminates when the improvement in the log-likelihood between successive steps is < tol. Defaults to 1e-6.

  • maxiter EM terminates with a warning if maxiter iterations are reached without convergence as defined by tol. Defaults to 100.

  • uncollapse Threshold for detecting and resetting state distribution when they collapse on a single point. State distributions are uncollapsed by re-drawing mus[[k]] from a standard multivariate normal and setting Sigmas[[k]] to the nfeatures-dimensional identity matrix. Note that this distribution is with respect to the standardized features.

  • standardize Whether features should be standardized. Defaults to TRUE. This option also adds a small amount of noise , equal to .01 x feature standard deviation, to observation-features that have been zeroed out (e.g. f0 during unvoiced periods). If set to FALSE, it is assumed that features have been externally standardized and zeroed-out values handled. scaling$feature_means and scaling$feature_sds are set to 0s and 1s, respectively, and no check is done to ensure this is correct. If features are in fact not standardized and zeroes handled, bad things will happen and nobody will feel sorry for you.

  • verbose Integer in 0:1. If 1, information on the EM process is reported. Defaults to 1.

Examples

Run this code
# NOT RUN {
data('audio')
# }
# NOT RUN {
mod <- hmm(audio$data, nstates = 2, control = list(verbose = TRUE))
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab