detectChangePoint and processStream for detecting single and multiple change points respectively. The remainder of the functions allow for more precise control over the change detection procedure. To cite this R package in a research paper, please use citation('cpm') to obtain the reference, and BibTeX entry.Note: this package has a manual titled "Parametric and Nonparametric Sequential Change Detection in R: The cpm Package" available from www.gordonjross.co.uk, which contains a full description of all the functions and algorithms in the package, as well as detailed instructions on how to use it.
If you would like to cite this package, the citation information is "G. J. Ross - Parametric and Nonparametric Sequential Change Detection in R: The cpm Package, Journal of Statistical Software, 2015"
A Brief CPM Overview
Given a sequence $X_1,...,X_n$ of random variables, the CPM works by evaluating a two-sample test statistic at every possible split point. Let $D_{k,n}$ be the value of the test statistic when the sequence is split into the two samples ${X_1, X_2,..., X_k}$ and ${X_{k+1}, X_{k+2} ,..., X_{n}}$, and define $D_n$ to be the maximum of these values. $D_n$ is then compared to some threshold, with a change being detected if the threshold is exceeded.
In the sequential context, the observations are processed one-by-one, with $D_t$ being computed based on the first t observations, $D_{t+1}$ being computed based on the first t+1 observations, and so on. The change detection time is defined as the first value of $t$ where the threshold is exceeded. Supposing this occurs at time $t=T$, then the best estimate of the location of the change point is the value of $k$ which maximised $D_{k,T}$. Writing $\hat{\tau}$ for this, we have that $\hat{\tau} \leq T$.
The thresholds are chosen so that there is a constant probability of a false positive occurring after each observation. This leads to control of the Average Run Length ($ARL_0$), defined as the expected number of observations received before a change is falsely detecting, assuming that no change has occurred.
The choice of test statistic in the CPM defines the class of changes which it is optimised towards detecting. This package implements CPMs using the following statistics. More details can be found in the references section:
GLR: Generalized Likelihood Ratio test statistic, as in [Hawkins and Zamba, 2005b]. Use to detect both mean and variance changes in a Gaussian sequence.Exponential: Generalized Likelihood Ratio test statistic for the Exponential distribution, as in [Ross, 2013]. Used to detect changes in the parameter of an Exponentially distributed sequence.GLRAdjustedandExponentialAdjusted: Identical to the GLR and Exponential statistics, except with the finite-sample correction discussed in [Ross, 2013] which can lead to more powerful change detection.For a fuller overview of the package which includes a description of the CPM framework and examples of how to use the various functions, please consult the full package manual titled "Parametric and Nonparametric Sequential Change Detection in R: The cpm Package"
Hawkins, D. , Zamba, K. (2005b) -- Statistical Process Control for Shifts in Mean or Variance Using a Changepoint Formulation, Technometrics, 47(2), 164-173 Hawkins, D., Qiu, P., Kang, C. (2003) -- The Changepoint Model for Statistical Process Control, Journal of Quality Technology, 35, 355-366. Ross, G. J., Tasoulis, D. K., Adams, N. M. (2011) -- A Nonparametric Change-Point Model for Streaming Data, Technometrics, 53(4) Ross, G. J., Adams, N. M. (2012) -- Two Nonparametric Control Charts for Detecting Arbitary Distribution Changes, Journal of Quality Technology, 44:102-116 Ross, G. J., Adams, N. M. (2013) -- Sequential Monitoring of a Proportion, Computational Statistics, 28(2)
Ross, G. J., (2014) -- Sequential Change Detection in the Presence of Unknown Parameters, Statistics and Computing 24:1017-1030 Ross, G. J., (2015) -- Parametric and Nonparametric Sequential Change Detection in R: The cpm Package, Journal of Statistical Software, forthcoming