Implements the BARD (Bayesian Abnormal Region Detector) procedure of Bardwell and Fearnhead (2017). BARD is a fully Bayesian inference procedure which is able to give measures of uncertainty about the number and location of anomalous regions. It uses negative binomial prior distributions on the lengths of anomalous and non-anomalous regions as well as a uniform prior for the means of anomalous regions. Inference is conducted by solving a set of recursions. To reduce computational and storage costs a resampling step is included.
bard(
x,
p_N = 1/(nrow(x) + 1),
p_A = 5/nrow(x),
k_N = 1,
k_A = (5 * p_A)/(1 - p_A),
pi_N = 0.9,
paffected = 0.05,
lower = 2 * sqrt(log(nrow(x))/nrow(x)),
upper = max(transform(x)),
alpha = 1e-04,
h = 0.25,
transform = robustscale
)
An n x p real matrix representing n observations of p variates. The time series data classes ts, xts, and zoo are also supported.
Hyper-parameter of the negative binomial distribution for the length of non-anomalous segments (probability of success). Defaults to $$\frac{1}{n+1}.$$
Hyper-parameter of the negative binomial distribution for the length of anomalous segments (probability of success). Defaults to $$\frac{5}{n}.$$
Hyper-parameter of the negative binomial distribution for the length of non-anomalous segments (size). Defaults to 1.
Hyper-parameter of the negative binomial distribution for the length of anomalous segments (size). Defaults to $$\frac{5p_A}{1- p_A}.$$
Probability that an anomalous segment is followed by a non-anomalous segment. Defaults to 0.9.
Proportion of the variates believed to be affected by any given anomalous segment. Defaults to 5%. This parameter is relatively robust to being mis-specified and is studied empirically in Section 5.1 of bardwell2017;textualanomaly.
The lower limit of the the prior uniform distribution for the mean of an anomalous segment \(\mu\). Defaults to $$2\sqrt{\frac{\log(n)}{n}}.$$
The upper limit of the prior uniform distribution for the mean of an anomalous segment \(\mu\).
Defaults to the largest standardised value of x, i.e. max(transform(x))
.
Threshold used to control the resampling in the approximation of the posterior distribution at each time step. A sensible default is 1e-4. Decreasing alpha increases the accuracy of the posterior distribution but also increases the computational complexity of the algorithm.
The step size in the numerical integration used to find the marginal likelihood.
The quadrature points are located from lower
to upper
in steps of h
. Defaults to 0.25.
Decreasing this parameter increases the accuracy of the calculation for the marginal likelihood but increases computational complexity.
A function used to transform the data prior to analysis. The default value is to scale the data using the median and the median absolute deviation.
An instance of the S4 object of type .bard.class
containing the data x
, procedure parameter values, and the results.
This function gives certain default hyper-parameters for the two segment length distributions. We chose these to be quite flexible for a range of problems. For non-anomalous segments a geometric distribution was selected having an average segment length of \(n\) with the standard deviation being of the same order. For anomalous segments we chose parameters that gave an average length of 5 and a variance of \(n\). These may not be suitable for all problems and the user is encouraged to tune these parameters.
bardwell2017anomaly
# NOT RUN {
library(anomaly)
set.seed(0)
sim.data<-simulate(n=500,p=50,mu=2,locations=c(100,200,300),
duration=6,proportions=c(0.04,0.06,0.08))
# run bard
bard.res<-bard(sim.data, alpha = 1e-3, h = 0.5)
sampler.res<-sampler(bard.res)
collective_anomalies(sampler.res)
# }
# NOT RUN {
plot(sampler.res,marginals=TRUE)
# }
Run the code above in your browser using DataLab