fit_pHMM: Fit a Partially Hidden Markov Model (pHMM)

Description

Fits a partially hidden Markov model (pHMM) to multivariate time series observations \(y\) with partially observed process states \(x\), using a constrained Baum-Welch algorithm. The function allows the user to provide custom initial parameters, and supports constraints on known means and/or covariances, as well as equal or diagonal covariance structures.

Usage

fit_pHMM(
  y,
  xlabeled,
  nstates,
  ppi_start = NULL,
  A_start = NULL,
  mean_start,
  covariance_start = NULL,
  known_mean = NULL,
  known_covariance = NULL,
  equal_covariance = FALSE,
  covariance_structure = "full",
  max_iter = 200,
  tol = 0.001,
  verbose = FALSE
)

Value

A list with components:

y, xlabeled: the input data.
log_lik, log_lik_vec: final and trace of log-likelihood.
iter: number of EM iterations performed.
logB, log_alpha, log_beta, log_gamma, log_xi: posterior quantities from the Baum-Welch algorithm.
logAhat, mean_hat, covariance_hat, log_pi_hat: estimated model parameters.
AIC, BIC: information criteria for model selection.

Arguments

y: A numeric matrix of dimension \(T \times d\), where each row corresponds to a \(d\)-dimensional observation at time \(t\).
xlabeled: An integer vector of length \(T\) with partially observed states. Known states must be integers in \(1, \ldots, N\); unknown states should be coded as NA.
nstates: Integer. The total number of hidden states to fit.
ppi_start: Numeric vector of length nstates giving the initial state distribution. If NULL, defaults to c(1,0,...,0).
A_start: Numeric nstates \(\times\) nstates transition probability matrix. If NULL, defaults to a transition matrix with diagonal entries equal to 1-0.01*(nstates-1) and all off-diagonal entries equal to 0.01.
mean_start: List of length nstates containing numeric mean vectors for the emission distributions.
covariance_start: List of covariance matrices for the emission distributions. Must be of length nstates, unless equal_covariance = TRUE, in which case it must be of length 1. If NULL, defaults to identity matrices.
known_mean: Optional list of known mean vectors. Use NA for unknown elements.
known_covariance: Optional list of known covariance matrices. Use NA for unknown elements.
equal_covariance: Logical. If TRUE, all states are constrained to share a common covariance matrix.
covariance_structure: Character string specifying the covariance structure. Either "full" (default) or "diagonal".
max_iter: Maximum number of EM iterations. Default is 200.
tol: Convergence tolerance for log-likelihood and parameter change. Default is 1e-3.
verbose: Logical. If TRUE, prints log-likelihood progress at each iteration.

References

Capezza, C., Lepore, A., & Paynabar, K. (2025). Stream-Based Active Learning for Process Monitoring. Technometrics. <doi:10.1080/00401706.2025.2561744>.

Examples

Run this code

library(ActiveLearning4SPM)
set.seed(123)
dat <- simulate_stream(T0 = 100, TT = 500)
y <- dat$y
xlabeled <- dat$x
d <- ncol(dat$y)
xlabeled[sample(1:600, 300)] <- NA
out <- fit_pHMM(y = y,
                xlabeled = xlabeled,
                nstates = 3,
                mean_start = list(rep(0, d), rep(1, d), rep(-1, d)),
                equal_covariance = TRUE)
out$AIC

Run the code above in your browser using DataLab