Window)An R6 class implementing slicing window for offline change-point detection.
$new()Initialises a Window object.
$describe()Describes the Window object.
$fit()Constructs a Window module in C++.
$eval()Evaluates the cost of a segment.
$predict()Performs Window given a linear penalty value.
$plot()Plots change-point segmentation in ggplot style.
$clone()Clones the R6 object.
Minh Long Nguyen edelweiss611428@gmail.com
Toby Dylan Hocking toby.hocking@r-project.org
Charles Truong ctruong@ens-paris-saclay.fr
minSizeInteger. Minimum allowed segment length. Can be accessed or modified via $minSize.
Modifying minSize will automatically trigger $fit().
radiusInteger. Window radius. Can be accessed or modified via $radius.
Modifying radius will automatically trigger $fit().
jumpInteger. Search grid step size. Can be accessed or modified via $jump.
Modifying jump will automatically trigger $fit().
costFuncR6 object of class costFunc. Search grid step size. Can be accessed or modified via $costFunc.
Modifying costFunc will automatically trigger $fit().
tsMatNumeric matrix. Input time series matrix of size \(n \times p\). Can be accessed or modified via $tsMat.
Modifying tsMat will automatically trigger $fit().
covariatesNumeric matrix. Input time series matrix having a similar number of observations as tsMat. Can be accessed or modified via $covariates.
Modifying covariates will automatically trigger $fit().
new()Initialises a Window object.
Window$new(minSize, jump, radius, costFunc)minSizeInteger. Minimum allowed segment length. Default: 1L.
jumpInteger. Search grid step size: only positions in {k, 2k, ...} are considered. Default: 1L.
radiusInteger. Radius of each sliding window. Default: 1L.
costFuncA R6 object of class costFunc. Should be created via costFunc$new() to avoid error.
Default: costFunc$new("L2").
Invisibly returns NULL.
describe()Describes a Window object.
Window$describe(printConfig = FALSE)printConfigLogical. Whether to print object configurations. Default: FALSE.
Invisibly returns a list storing at least the following fields:
minSizeMinimum allowed segment length.
jumpSearch grid step size.
radiusRadius of each sliding window.
costFuncThe costFun object.
fittedWhether or not $fit() has been run.
tsMatTime series matrix.
covariatesCovariate matrix (if exists).
nNumber of observations.
pNumber of features.
fit()Constructs a C++ module for Window.
Window$fit(tsMat = NULL, covariates = NULL)tsMatNumeric matrix. A time series matrix of size \(n \times p\) whose rows are observations ordered in time.
If tsMat = NULL, the method will use the previously assigned tsMat (e.g., set via the active binding $tsMat
or from a prior $fit(tsMat)). Default: NULL.
covariatesNumeric matrix. A time series matrix having a similar number of observations as tsMat.
Required for models involving both dependent and independent variables.
If covariates = NULL and no prior covariates were set (i.e., $covariates is still NULL),
the model is force-fitted with only an intercept. Default: NULL..
This method constructs a C++ Window module and sets private$.fitted to TRUE,
enabling the use of $predict() and $eval(). Some precomputations are performed to allow
$predict() to run in linear time with respect to the number of local change-points
(see $predict() for more details).
Invisibly returns NULL.
aInteger. Start index of the segment (exclusive). Must satisfy start < end.
bInteger. End index of the segment (inclusive).
The segment cost is evaluated as follows:
L1 cost function: $$c_{L_1}(y_{(a+1)...b}) := \sum_{t = a+1}^{b} \| y_t - \tilde{y}_{(a+1)...b} \|_1$$ where \(\tilde{y}_{(a+1)...b}\) is the coordinate-wise median of the segment. If \(a \ge b - 1\), return 0.
L2 cost function: $$c_{L_2}(y_{(a+1)...b}) := \sum_{t = a+1}^{b} \| y_t - \bar{y}_{(a+1)...b} \|_2^2$$ where \(\bar{y}_{(a+1)...b}\) is the empirical mean of the segment. If \(a \ge b - 1\), return 0.
SIGMA cost function:
$$c_{\sum}(y_{(a+1)...b}) := (b - a)\log \det \hat{\Sigma}_{(a+1)...b}$$ where \(\hat{\Sigma}_{(a+1)...b}\) is
the empirical covariance matrix of the segment without Bessel's correction. Here, if addSmallDiag = TRUE, a small
bias epsilon is added to the diagonal of estimated covariance matrices to improve numerical stability.
By default, addSmallDiag = TRUE and epsilon = 1e-6. In case addSmallDiag = TRUE, if the computed determinant of covariance matrix is either 0 (singular)
or smaller than p*log(epsilon) - the lower bound, return (b - a)*p*log(epsilon), otherwise, output an error message.
VAR(r) cost function:
$$c_{\mathrm{VAR}}(y_{(a+1)...b}) := \sum_{t = a+r+1}^{b} \left\| y_t - \sum_{j=1}^r \hat A_j y_{t-j} \right\|_2^2$$
where \(\hat A_j\) are the estimated VAR coefficients, commonly estimated via the OLS criterion. If system is singular,
\(a-b < p*r+1\) (i.e., not enough observations), or \(a \ge n-p\) (where n is the time series length), return 0.
"LinearL2" for piecewise linear regression process with constant noise variance $$c_{\text{LinearL2}}(y_{(a+1):b}) := \sum_{t=a+1}^b \| y_t - X_t \hat{\beta} \|_2^2$$ where \(\hat{\beta}\) are OLS estimates on segment \((a+1):b\). If segment is shorter than the minimum number of points needed for OLS, return 0.
The segment cost.
penNumeric. Penalty per change-point. Default: 0.
The algorithm scans the data with a fixed-size window to detect candidate local change-points (lcps) if the gains of its \(k_\text{thresh}\) neighbors to the left and right are all smaller than its gain, where \(k_\text{thresh}\) is defined as $$ k_\text{thresh} = \max \left( \frac{\max(2\text{radius}, 2 \cdot \text{minSize})}{2 \cdot \text{jump}}, 1 \right) $$
After candidate local change-points and computing the local gains, the algorithm selects the "optimal" set of break-points given the linear penalty threshold. Let \(G_i\) denote the local gain for candidate change-point \(i\), for \(i = 1, \dots, n_\text{lcps}\). The local gains are ordered such that \(G_1 \ge G_2 \ge \dots \ge G_{n_\text{lcps}}\). Note that it is possible that no local change-points are detected, for example if the window size is too large.
The total cost for the selected \(k\) change-points is then calculated as $$ \text{TotalCost} = - \sum_{i=1}^{k} G_i + \lambda \cdot k, $$ where \(\lambda\) is a linear penalty applied per change-point. We then optimise over \(k\) to minimise the penalised cost function.
This approach allows detecting multiple change-points in a time series while controlling model complexity through the linear penalty threshold.
In our implementation, scanning the data to detect candidate local change-points and computing their corresponding local gains is already performed in $fit().
Therefore, $predict() runs in linear time with respect to the number of local change-points.
Temporary segment end-points are saved to private$.tmpEndPoints after $predict(), enabling users to call $plot() without
specifying endpoints manually.
An integer vector of regime end-points. By design, the last element is the number of observations.
plot()Plots change-point segmentation
Window$plot(
d = 1L,
endPts,
dimNames,
main,
xlab,
tsWidth = 0.25,
tsCol = "#5B9BD5",
bgCol = c("#A3C4F3", "#FBB1BD"),
bgAlpha = 0.5,
ncol = 1L
)dInteger vector. Dimensions to plot. Default: 1L.
endPtsInteger vector. End points. Default: latest temporary changepoints obtained via $predict().
dimNamesCharacter vector. Feature names matching length of d. Defaults to "X1", "X2", ....
mainCharacter. Main title. Defaults to "Window: d = ...".
xlabCharacter. X-axis label. Default: "Time".
tsWidthNumeric. Line width for time series and segments. Default: 0.25.
tsColCharacter. Time series color. Default: "#5B9BD5".
bgColCharacter vector. Segment colors, recycled to length of endPts. Default: c("#A3C4F3", "#FBB1BD").
bgAlphaNumeric. Background transparency. Default: 0.5.
ncolInteger. Number of columns in facet layout. Default: 1L.
Plots change-point segmentation results. Based on ggplot2. Multiple plots can easily be
horizontally and vertically stacked using patchwork's operators / and |, respectively.
An object of classes gg and ggplot.
clone()The objects of this class are cloneable with this method.
Window$clone(deep = FALSE)deepWhether to make a deep clone.
Slicing window is a scalable, linear-time change-point detection algorithm that selects breakpoints based on local gains computed over sliding windows.
Currently supports the following cost functions:
"L1" and "L2" for (independent) piecewise Gaussian process with constant variance
"SIGMA": for (independent) piecewise Gaussian process with varying variance
"VAR": for piecewise Gaussian vector-regressive process with constant noise variance
"LinearL2": for piecewise linear regression process with constant noise variance
Window requires a R6 object of class costFunc, which can be created via costFunc$new(). Currently, the following cost functions are supported:
"L1" and "L2" for (independent) piecewise Gaussian process with constant variance
"SIGMA": for (independent) piecewise Gaussian process with varying variance
"VAR": for piecewise Gaussian vector-regressive process with constant noise variance
"LinearL2": for piecewise linear regression process with constant noise variance
See $eval() method for more details on computation of cost.
Some examples are provided below. See the GitHub README for detailed basic usage!
Truong, C., Oudre, L., & Vayatis, N. (2020). Selective review of offline change point detection methods. Signal Processing, 167, 107299.
## L2 example
set.seed(1121)
signals = as.matrix(c(rnorm(100,0,1),
rnorm(100,5,1)))
# Default L2 cost function
WindowObj = Window$new(minSize = 1L, jump = 1L)
WindowObj$fit(signals)
WindowObj$predict(pen = 100)
WindowObj$plot()
## SIGMA example
set.seed(111)
signals = as.matrix(c(rnorm(100,-5,1),
rnorm(100,-5,10),
rnorm(100,-5,1)))
# L2 cost function
WindowObj = Window$new(minSize = 1L, jump = 1L)
WindowObj$fit(signals)
# We choose pen = 50.
WindowObj$predict(pen = 50)
WindowObj$plot()
# The standard L2 cost function is not suitable.
# Use the SIGMA cost function.
WindowObj$costFunc = costFunc$new(costFunc = "SIGMA")
WindowObj$predict(pen = 50)
WindowObj$plot()
Run the code above in your browser using DataLab