Learn R Programming

WarnEpi (version 1.0.1)

mCUSUM: Modified Cumulative Sum

Description

Modified CUSUM method for outbreak detection in infectious disease surveillance data. Implements three variants (C1', C2', C3') with dynamic thresholds for time series analysis.

Usage

mCUSUM(data, column, k = 1, h = 2, move_t)

Value

A data frame containing C1', C2' and C3' warning results. The value of the warning column is 1 for warning and 0 for no warning.

Arguments

data

A data frame containing the warning indicator columns, arranged in time-based order.

column

A column name or column number, used to specify the warning indicator.

k

The standard deviation coefficient \(k\).

h

The threshold coefficient \(h\).

move_t

The moving period \(t_{move}\).

Details

Let \(\mathbf{X} = (X_1,\ldots,X_T)^\top\) be an observed time series of disease case counts, where \(X_t\) represents the aggregated counts at time \(t\) (e.g., daily, weekly, or monthly observations). We assume \(X_t \sim N(\mu, \sigma^2)\) for the underlying distribution.

The modified CUSUM models accumulate excess cases beyond control limits:

$$C1'_0 = C2'_0 = 0$$ $$C1'_t = \max\left(0, X_t - (\hat{\mu}_t + k\hat{\sigma}_t) + C1'_{t-1}\right)$$ $$C2'_t = \max\left(0, X_t - (\hat{\mu}_t + k\hat{\sigma}_t) + C2'_{t-1}\right)$$ $$C3'_t = C2'_t + C2'_{t-1} + C2'_{t-2}$$ $$H_t = h\hat{\sigma}_t$$

where:

  • \(k\): Standard deviation coefficient (typical range 0.5–1.5), adjusts sensitivity to deviations

  • \(h\): Threshold coefficient (typical range 2–5), controls alarm stringency

  • \(H\): Threshold

Model specifications:

  • C1': Baseline \(\hat{\mu}_t, \hat{\sigma}_t\) estimated from \((X_{t-t_{move}},...,X_{t-1})\)

  • C2': Baseline \(\hat{\mu}_t, \hat{\sigma}_t\) estimated from \((X_{t-2-t_{move}},...,X_{t-3})\) to avoid recent outbreaks

  • C3': 3-day cumulative sum of C2' values

  • Alarms trigger when \(Cx'_t > H_t\) for each model (x = 1,2,3)

References

Wang X, Zeng D, Seale H, et al. Comparing early outbreak detection algorithms based on their optimized parameter values. J Biomed Inform, 2010,43(1):97-103.

Examples

Run this code
## simulate reported cases
set.seed(123)
cases <- c(round(rnorm(10, 10, 1)), seq(12,21,3), seq(15,5,-5))
dates <- seq(as.Date("2025-01-01"), by = "7 days", length.out = length(cases))
data_frame <- data.frame(date = dates, case = cases)

## modeling
output <- mCUSUM(data_frame, 'case', k = 1, h = 2.5, move_t = 4)
output

## visualize alerts
### C1'
plot(output$date, output$case, type = "l")
points(output$date[output$C1_prime_warning == 1],
       output$case[output$C1_prime_warning == 1], col = "red")

### C2'
plot(output$date, output$case, type = "l")
points(output$date[output$C2_prime_warning == 1],
       output$case[output$C2_prime_warning == 1], col = "red")

### C3'
plot(output$date, output$case, type = "l")
points(output$date[output$C3_prime_warning == 1],
       output$case[output$C3_prime_warning == 1], col = "red")

Run the code above in your browser using DataLab