bard: Detection of multivariate anomalous segments using BARD.

Description

Implements the BARD (Bayesian Abnormal Region Detector) procedure of Bardwell and Fearnhead (2017). BARD is a fully Bayesian inference procedure which is able to give measures of uncertainty about the number and location of anomalous regions. It uses negative binomial prior distributions on the lengths of anomalous and non-anomalous regions as well as a uniform prior for the means of anomalous regions. Inference is conducted by solving a set of recursions. To reduce computational and storage costs a resampling step is included.

Usage

bard(
  x,
  p_N = 1/(nrow(x) + 1),
  p_A = 5/nrow(x),
  k_N = 1,
  k_A = (5 * p_A)/(1 - p_A),
  pi_N = 0.9,
  paffected = 0.05,
  lower = 2 * sqrt(log(nrow(x))/nrow(x)),
  upper = max(x),
  alpha = 1e-04,
  h = 0.25
)

Value

An instance of the S4 object of type .bard.class containing the data x, procedure parameter values, and the results.

Arguments

x: A numeric matrix with n rows and p columns containing the data which is to be inspected. The time series data classes ts, xts, and zoo are also supported.
p_N: Hyper-parameter of the negative binomial distribution for the length of non-anomalous segments (probability of success). Defaults to \(\frac{1}{n+1}.\)
p_A: Hyper-parameter of the negative binomial distribution for the length of anomalous segments (probability of success). Defaults to \(\frac{5}{n}.\)
k_N: Hyper-parameter of the negative binomial distribution for the length of non-anomalous segments (size). Defaults to 1.
k_A: Hyper-parameter of the negative binomial distribution for the length of anomalous segments (size). Defaults to \(\frac{5p_A}{1- p_A}.\)
pi_N: Probability that an anomalous segment is followed by a non-anomalous segment. Defaults to 0.9.
paffected: Proportion of the variates believed to be affected by any given anomalous segment. Defaults to 5%. This parameter is relatively robust to being mis-specified and is studied empirically in Section 5.1 of bardwell2017;textualanomaly.
lower: The lower limit of the the prior uniform distribution for the mean of an anomalous segment \(\mu\). Defaults to \(2\sqrt{\frac{\log(n)}{n}}.\)
upper: The upper limit of the prior uniform distribution for the mean of an anomalous segment \(\mu\). Defaults to the largest value of x.
alpha: Threshold used to control the resampling in the approximation of the posterior distribution at each time step. A sensible default is 1e-4. Decreasing alpha increases the accuracy of the posterior distribution but also increases the computational complexity of the algorithm.
h: The step size in the numerical integration used to find the marginal likelihood. The quadrature points are located from lower to upper in steps of h. Defaults to 0.25. Decreasing this parameter increases the accuracy of the calculation for the marginal likelihood but increases computational complexity.

Notes on default hyper-parameters

This function gives certain default hyper-parameters for the two segment length distributions. We chose these to be quite flexible for a range of problems. For non-anomalous segments a geometric distribution was selected having an average segment length of \(n\) with the standard deviation being of the same order. For anomalous segments we chose parameters that gave an average length of 5 and a variance of \(n\). These may not be suitable for all problems and the user is encouraged to tune these parameters.

References

bardwell2017anomaly

JSS-anomaly-paper-finalanomaly

Examples

Run this code


library(anomaly)
data(simulated)
# run bard
bard.res<-bard(sim.data, alpha = 1e-3, h = 0.5)
sampler.res<-sampler(bard.res)
collective_anomalies(sampler.res)
# \donttest{
plot(sampler.res,marginals=TRUE)
# }

Run the code above in your browser using DataLab