PELT)An R6 class implementing the PELT algorithm for offline change-point detection.
$new()Initialises a PELT object.
$describe()Describes the PELT object.
$fit()Constructs a PELT module in C++.
$eval()Evaluates the cost of a segment.
$predict()Performs PELT given a linear penalty value.
$plot()Plots change-point segmentation in ggplot style.
$clone()Clones the R6 object.
Minh Long Nguyen edelweiss611428@gmail.com
Toby Dylan Hocking toby.hocking@r-project.org
Charles Truong ctruong@ens-paris-saclay.fr
minSizeInteger. Minimum allowed segment length. Can be accessed or modified via $minSize.
Modifying minSize will automatically trigger $fit().
jumpInteger. Search grid step size. Can be accessed or modified via $jump.
Modifying jump will automatically trigger $fit().
costFuncR6 object of class costFunc. Search grid step size. Can be accessed or modified via $costFunc.
Modifying costFunc will automatically trigger $fit().
tsMatNumeric matrix. Input time series matrix of size \(n \times p\). Can be accessed or modified via $tsMat.
Modifying tsMat will automatically trigger $fit().
covariatesNumeric matrix. Input time series matrix having a similar number of observations as tsMat. Can be accessed or modified via $covariates.
Modifying covariates will automatically trigger $fit().
new()Initialises a PELT object.
PELT$new(minSize, jump, costFunc)minSizeInteger. Minimum allowed segment length. Default: 1L.
jumpInteger. Search grid step size: only positions in {k, 2k, ...} are considered. Default: 1L.
costFuncA R6 object of class costFunc. Should be created via costFunc$new() to avoid error.
Default: costFunc$new("L2").
Invisibly returns NULL.
describe()Describes a PELT object.
PELT$describe(printConfig = FALSE)printConfigLogical. Whether to print object configurations. Default: FALSE.
Invisibly returns a list storing at least the following fields:
minSizeMinimum allowed segment length.
jumpSearch grid step size.
costFuncThe costFun object.
fittedWhether or not $fit() has been run.
tsMatTime series matrix.
covariatesCovariate matrix (if exists).
nNumber of observations.
pNumber of features.
fit()Constructs a C++ module for PELT.
PELT$fit(tsMat = NULL, covariates = NULL)tsMatNumeric matrix. A time series matrix of size \(n \times p\) whose rows are observations ordered in time.
If tsMat = NULL, the method will use the previously assigned tsMat (e.g., set via the active binding $tsMat
or from a prior $fit(tsMat)). Default: NULL.
covariatesNumeric matrix. A time series matrix having a similar number of observations as tsMat.
Required for models involving both dependent and independent variables.
If covariates = NULL and no prior covariates were set (i.e., $covariates is still NULL),
the model is force-fitted with only an intercept. Default: NULL.
This method constructs a C++ PELT module and sets private$.fitted to TRUE, enabling the use of $predict() and $eval().
Invisibly returns NULL.
aInteger. Start index of the segment (exclusive). Must satisfy start < end.
bInteger. End index of the segment (inclusive).
The segment cost is evaluated as follows:
L1 cost function: $$c_{L_1}(y_{(a+1)...b}) := \sum_{t = a+1}^{b} \| y_t - \tilde{y}_{(a+1)...b} \|_1$$ where \(\tilde{y}_{(a+1)...b}\) is the coordinate-wise median of the segment. If \(a \ge b - 1\), return 0.
L2 cost function: $$c_{L_2}(y_{(a+1)...b}) := \sum_{t = a+1}^{b} \| y_t - \bar{y}_{(a+1)...b} \|_2^2$$ where \(\bar{y}_{(a+1)...b}\) is the empirical mean of the segment. If \(a \ge b - 1\), return 0.
SIGMA cost function:
$$c_{\sum}(y_{(a+1)...b}) := (b - a)\log \det \hat{\Sigma}_{(a+1)...b}$$ where \(\hat{\Sigma}_{(a+1)...b}\) is
the empirical covariance matrix of the segment without Bessel's correction. Here, if addSmallDiag = TRUE, a small
bias epsilon is added to the diagonal of estimated covariance matrices to improve numerical stability.
By default, addSmallDiag = TRUE and epsilon = 1e-6. In case addSmallDiag = TRUE, if the computed determinant of covariance matrix is either 0 (singular)
or smaller than p*log(epsilon) - the lower bound, return (b - a)*p*log(epsilon), otherwise, output an error message.
VAR(r) cost function:
$$c_{\mathrm{VAR}}(y_{(a+1)...b}) := \sum_{t = a+r+1}^{b} \left\| y_t - \sum_{j=1}^r \hat A_j y_{t-j} \right\|_2^2$$
where \(\hat A_j\) are the estimated VAR coefficients, commonly estimated via the OLS criterion. If system is singular,
\(a-b < p*r+1\) (i.e., not enough observations), or \(a \ge n-p\) (where n is the time series length), return 0.
"LinearL2" for piecewise linear regression process with constant noise variance $$c_{\text{LinearL2}}(y_{(a+1):b}) := \sum_{t=a+1}^b \| y_t - X_t \hat{\beta} \|_2^2$$ where \(\hat{\beta}\) are OLS estimates on segment \((a+1):b\). If segment is shorter than the minimum number of points needed for OLS, return 0.
The segment cost.
penNumeric. Penalty per change-point. Default: 0.
The PELT algorithm detects multiple change-points by finding the set of break-points that globally minimises
a penalised cost function. PELT uses dynamic programming combined with a pruning rule to reduce the number of candidate change-points, achieving efficient computation.
Let \([c_1, \dots, c_k, c_{k+1}]\) denote the set of segment end-points with \(c_1 < c_2 < \dots < c_k < c_{k+1} = n\),
where \(k\) is the number of detected change-points and \(n\) is the total number of data points.
Let \(c_{(c_i, c_{i+1}]}\) be the cost of segment \((c_i, c_{i+1}]\).
The total penalised cost is
$$
\text{TotalCost} = \sum_{i=1}^{k+1} c_{(c_i, c_{i+1}]} + \lambda \cdot k,
$$
where \(\lambda\) is a linear penalty applied per change-point. PELT finds the set of endpoints that minimises this cost exactly.
The pruning step eliminates candidate change-points that cannot lead to an optimal solution,
allowing PELT to run in linear time with respect to the number of data points.
Temporary segment end-points are saved to private$.tmpEndPoints after $predict(), enabling users to call $plot() without
specifying endpoints manually.
An integer vector of regime end-points. By design, the last element is the number of observations.
plot()Plots change-point segmentation
PELT$plot(
d = 1L,
endPts,
dimNames,
main,
xlab,
tsWidth = 0.25,
tsCol = "#5B9BD5",
bgCol = c("#A3C4F3", "#FBB1BD"),
bgAlpha = 0.5,
ncol = 1L
)dInteger vector. Dimensions to plot. Default: 1L.
endPtsInteger vector. End points. Default: latest temporary changepoints obtained via $predict().
dimNamesCharacter vector. Feature names matching length of d. Defaults to "X1", "X2", ....
mainCharacter. Main title. Defaults to "PELT: d = ...".
xlabCharacter. X-axis label. Default: "Time".
tsWidthNumeric. Line width for time series and segments. Default: 0.25.
tsColCharacter. Time series color. Default: "#5B9BD5".
bgColCharacter vector. Segment colors, recycled to length of endPts. Default: c("#A3C4F3", "#FBB1BD").
bgAlphaNumeric. Background transparency. Default: 0.5.
ncolInteger. Number of columns in facet layout. Default: 1L.
Plots change-point segmentation results. Based on ggplot2. Multiple plots can easily be
horizontally and vertically stacked using patchwork's operators / and |, respectively.
An object of classes gg and ggplot.
clone()The objects of this class are cloneable with this method.
PELT$clone(deep = FALSE)deepWhether to make a deep clone.
PELT (Pruned Exact Linear Time) is an efficient algorithm for change point detection that prunes the search space to achieve optimal segmentation in linear time under certain conditions.
PELT requires a R6 object of class costFunc, which can be created via costFunc$new(). Currently, the following cost functions are supported:
"L1" and "L2" for (independent) piecewise Gaussian process with constant variance
"SIGMA": for (independent) piecewise Gaussian process with varying variance
"VAR": for piecewise Gaussian vector-regressive process with constant noise variance
"LinearL2": for piecewise linear regression process with constant noise variance
See $eval() method for more details on computation of cost.
Some examples are provided below. See the GitHub README for detailed basic usage!
Truong, C., Oudre, L., & Vayatis, N. (2020). Selective review of offline change point detection methods. Signal Processing, 167, 107299.
Killick, R., Fearnhead, P., & Eckley, I. A. (2012). Optimal detection of change points with a linear computational cost. Journal of the American Statistical Association, 107(500), 1590-1598.
## L2 example
set.seed(1121)
signals = as.matrix(c(rnorm(100,0,1),
rnorm(100,5,1)))
# Default L2 cost function
PELTObj = PELT$new(minSize = 1L, jump = 1L)
PELTObj$fit(signals)
PELTObj$predict(pen = 100)
PELTObj$plot()
## SIGMA example
set.seed(111)
signals = as.matrix(c(rnorm(100,-5,1),
rnorm(100,-5,10),
rnorm(100,-5,1)))
# L2 cost function
PELTObj = PELT$new(minSize = 1L, jump = 1L)
PELTObj$fit(signals)
# We choose pen = 50.
PELTObj$predict(pen = 50)
PELTObj$plot()
# The standard L2 cost function is not suitable.
# Use the SIGMA cost function.
PELTObj$costFunc = costFunc$new(costFunc = "SIGMA")
PELTObj$predict(pen = 50)
PELTObj$plot()
Run the code above in your browser using DataLab