Learn R Programming

ActiveLearning4SPM (version 0.1.0)

active_learning_pHMM: Stream-Based Active Learning with a Partially Hidden Markov Model (pHMM)

Description

Implements the stream-based active learning strategy of Capezza, Lepore, and Paynabar (2025) for process monitoring with partially observed states. At each time step, the method fits a pHMM to the available data, and balances between exploitation (reducing predictive uncertainty) and exploration (detecting potential out-of-control shifts) to decide whether to request the true label of the current observation. Labeling requests are constrained by a user-defined budget.

Usage

active_learning_pHMM(
  y,
  true_x,
  T0,
  B = 0.1,
  weight_exploration = 0.5,
  lambda_MEWMA = 0.3,
  verbose = FALSE
)

Value

A list with components:

  • decision: character vector indicating the action taken at each time ("label_exploitation", "label_exploration", or predicted state).

  • xlabeled: updated state sequence including acquired labels.

  • xhat: final predicted state sequence.

  • scores: classification performance metrics (accuracy, precision, recall, F1, AUC) computed against the true states.

Arguments

y

A numeric matrix of dimension \(T \times d\), where each row corresponds to a \(d\)-dimensional observation at time \(t\).

true_x

Integer vector of true states of length nrow(y), used to assess model predictions. The first T0 values, assumed to be from an in-control process, must be 1.

T0

Integer. Number of initial observations assumed to be labeled as in-control (state 1).

B

Numeric between 0 and 1. Labeling budget, expressed as the maximum fraction of observations (after the first T0) for which labels may be acquired. Default is 0.1.

weight_exploration

Numeric between 0 and 1. Weight assigned to the exploration criterion. The exploitation weight is computed as 1 - weight_exploration. Default is 0.5.

lambda_MEWMA

Numeric in (0,1). Smoothing parameter for the MEWMA statistic used in the exploration criterion. Default is 0.3.

verbose

Logical. If TRUE, prints the current time index as the algorithm progresses. Default is FALSE.

Details

The exploitation criterion is based on the entropy of the state sequence, while the exploration criterion uses a multivariate exponentially weighted moving average (MEWMA) statistic. The two criteria are combined with a user-defined weighting, and labeling stops when the budget is exhausted or at the end of the data stream.

References

Capezza, C., Lepore, A., & Paynabar, K. (2025). Stream-Based Active Learning for Process Monitoring. Technometrics. <doi:10.1080/00401706.2025.2561744>.

Examples

Run this code
# \donttest{
library(ActiveLearning4SPM)
set.seed(123)
dat <- simulate_stream(T0 = 50, TT = 100, T_min_IC = 20, T_max_IC = 30)
out <- active_learning_pHMM(y = dat$y,
                            true_x = dat$x,
                            T0 = 50,
                            B = 0.1)
table(out$decision)
out$scores$f1
# }

Run the code above in your browser using DataLab