active_learning_pHMM: Stream-Based Active Learning with a Partially Hidden Markov Model (pHMM)

Description

Implements the stream-based active learning strategy of Capezza, Lepore, and Paynabar (2025) for process monitoring with partially observed states. At each time step, the method fits a pHMM to the available data, and balances between exploitation (reducing predictive uncertainty) and exploration (detecting potential out-of-control shifts) to decide whether to request the true label of the current observation. Labeling requests are constrained by a user-defined budget.

Usage

active_learning_pHMM(
  y,
  true_x,
  T0,
  B = 0.1,
  weight_exploration = 0.5,
  lambda_MEWMA = 0.3,
  verbose = FALSE
)

Value

A list with components:

decision: character vector indicating the action taken at each time ("label_exploitation", "label_exploration", or predicted state).
xlabeled: updated state sequence including acquired labels.
xhat: final predicted state sequence.
scores: classification performance metrics (accuracy, precision, recall, F1, AUC) computed against the true states.

Arguments

y: A numeric matrix of dimension \(T \times d\), where each row corresponds to a \(d\)-dimensional observation at time \(t\).
true_x: Integer vector of true states of length nrow(y), used to assess model predictions. The first T0 values, assumed to be from an in-control process, must be 1.
T0: Integer. Number of initial observations assumed to be labeled as in-control (state 1).
B: Numeric between 0 and 1. Labeling budget, expressed as the maximum fraction of observations (after the first T0) for which labels may be acquired. Default is 0.1.
weight_exploration: Numeric between 0 and 1. Weight assigned to the exploration criterion. The exploitation weight is computed as 1 - weight_exploration. Default is 0.5.
lambda_MEWMA: Numeric in (0,1). Smoothing parameter for the MEWMA statistic used in the exploration criterion. Default is 0.3.
verbose: Logical. If TRUE, prints the current time index as the algorithm progresses. Default is FALSE.

Details

The exploitation criterion is based on the entropy of the state sequence, while the exploration criterion uses a multivariate exponentially weighted moving average (MEWMA) statistic. The two criteria are combined with a user-defined weighting, and labeling stops when the budget is exhausted or at the end of the data stream.

References

Capezza, C., Lepore, A., & Paynabar, K. (2025). Stream-Based Active Learning for Process Monitoring. Technometrics. <doi:10.1080/00401706.2025.2561744>.

Examples

Run this code

# \donttest{
library(ActiveLearning4SPM)
set.seed(123)
dat <- simulate_stream(T0 = 50, TT = 100, T_min_IC = 20, T_max_IC = 30)
out <- active_learning_pHMM(y = dat$y,
                            true_x = dat$x,
                            T0 = 50,
                            B = 0.1)
table(out$decision)
out$scores$f1
# }

Run the code above in your browser using DataLab