Learn R Programming

WarnEpi (version 1.0.1)

EWMA: Exponentially Weighted Moving Average

Description

Detects anomalies in infectious disease surveillance data using an Exponentially Weighted Moving Average (EWMA) algorithm. Designed for time series data, it flags potential outbreaks by smoothing past observations with decayed weights and comparing against control thresholds.

Usage

EWMA(data, column, lambda = 0.5, k = 3, move_t, ignore_t = 2)

Value

A data frame containing warning results. The value of the warning column is 1 for warning and 0 for no warning.

Arguments

data

A data frame containing the warning indicator columns, arranged in time-based order.

column

A column name or column number, used to specify the warning indicator.

lambda

The weight factor \(\lambda\), ranging from 0 to 1(higher values prioritize recent observations).

k

The standard deviation coefficient \(k\).

move_t

The moving period \(t_{move}\).

ignore_t

The number of nearest time units to be ignored by the model, \(t_{ignore}\).

Details

Let \(\mathbf{X} = (X_1,\ldots,X_T)^\top\) be an observed time series of disease case counts, where \(X_t\) represents the aggregated counts at time \(t\) (e.g., daily, weekly, or monthly observations). We assume \(X_t \sim N(\mu, \sigma^2)\) for the underlying distribution.

The EWMA (Exponentially Weighted Moving Average) model is defined as: $$Z_1 = X_1$$ $$Z_t = \lambda X_t + (1-\lambda)Z_{t-1}$$ $$UCL_t = \hat{\mu}_t + k\hat{\sigma}_t\sqrt{\frac{\lambda}{2-\lambda}}$$

where:

  • \(Z_t\): The EWMA statistic at time \(t\), representing an exponentially weighted average of current and past observations.

  • \(\lambda\): Weight factor (\(0 < \lambda < 1\)), higher values prioritize recent observations

  • \(k\): Standard deviation coefficient (typically 2-3)

  • \(UCL_t\): Upper Control Limit at time \(t\), forming a dynamic threshold for anomaly detection.

  • \(\hat{\mu}_t, \hat{\sigma}_t\): Estimated from moving window \((X_{t-t_{move}-t_{ignore}},\ldots,X_{t-1-t_{ignore}})\)

An alarm is triggered when \(Z_t > UCL_t\), with the alarm set defined as: $$\mathcal{T} = \{t: Z_t > UCL_t\}$$

References

Wang X, Zeng D, Seale H, et al. Comparing early outbreak detection algorithms based on their optimized parameter values. J Biomed Inform, 2010,43(1):97-103.

Examples

Run this code
## simulate reported cases
set.seed(123)
cases <- c(round(rnorm(10, 10, 1)), seq(12,21,3), seq(15,5,-5))
dates <- seq(as.Date("2025-01-01"), by = "7 days", length.out = length(cases))
data_frame <- data.frame(date = dates, case = cases)

## modeling
output <- EWMA(data_frame,'case',lambda = 0.5, k = 3, move_t = 4, ignore_t = 2)
output

## visualize alerts
plot(output$date, output$case, type = "l")
points(output$date[output$warning == 1],
       output$case[output$warning == 1], col = "red")

Run the code above in your browser using DataLab