ID: Multiple change-point detection in piecewise-constant or continuous, piecewise-linear signals using the Isolate-Detect methodology

Description

This is the main, general function of the package. It employs more specialised functions in order to estimate the number and locations of multiple change-points in the noisy, piecewise-constant or continuous, piecewise-linear input vector xd. The noise can either follow the Gaussian distribution or not. The approach that is followed is a hybrid between the thresholding approach (explained in pcm_th and cplm_th) and the information criterion approach (explained in pcm_ic and cplm_ic) and estimates the change-points taking into account both these approaches. Further to the number and the location of the estimated change-points, ID, returns the estimated signal, as well as the solution path. For more information and the relevant literature reference, see Details.

Usage

ID(xd, th.cons = 1, th.cons_lin = 1.4, th.ic = 0.9, th.ic.lin = 1.25,
  lambda = 3, lambda.ic = 10, contrast = c("mean", "slope"), ht = FALSE,
  scale = 3)

Arguments

A numeric vector containing the data in which you would like to find change-points.

th.cons

A positive real number with default value equal to 1. It is used to define the threshold, if the thresholding approach (explained in pcm_th) is to be followed to detect the change-points in the scenario of piecewise-constant signals.

th.cons_lin

A positive real number with default value equal to 1.4. It is used to define the threshold, if the thresholding approach (explained in cplm_th) is to be followed to detect the change-points in the scenario of continuous, piecewise-linear signals.

th.ic

A positive real number with default value equal to 0.9. It is useful only if the model selection based Isolate-Detect method (described in pcm_ic) is to be followed for the scenario of piecewise-constant signals. It is used to define the threshold value that will be used at the first step (change-point overestimation) of the model selection approach.

th.ic.lin

A positive real number with default value equal to 1.25. It is useful only if the model selection based Isolate-Detect method (described in cplm_ic) is to be followed for the scenario of continuous, piecewise-linear signals. It is used to define the threshold value that will be used at the first step (change-point overestimation) of the model selection approach.

lambda

A positive integer with default value equal to 3. It is used only when the threshold based approach is to be followed and it defines the distance between two consecutive end- or start-points of the right- or left-expanding intervals, respectively.

lambda.ic

A positive integer with default value equal to 10. It is used only when the information criterion based approach is to be followed and it defines the distance between two consecutive end- or start-points of the right- or left-expanding intervals, respectively.

contrast

A character string, which defines the type of the contrast function to be used in the Isolate-Detect algorithm. If contrast = ``mean'', then the algorithm looks for changes in a piecewise-constant signal. If contrast = ``slope'', then the algorithm looks for changes in a continuous, piecewise-linear signal.

A logical variable with default value equal to FALSE. If FALSE, the noise is assumed to follow the Gaussian distribution. If TRUE, then the noise is assumed to follow a distribution that has tails heavier than those of the Gaussian distribution.

scale

A positive integer number with default value equal to 3. It is used to define the way we pre-average the given data sequence only if ht = TRUE. See the Details in ht_ID_pcm for more information on how we pre-average.

Value

A list with the following components:

	`cpt`
A vector with the detected change-points.	`no_cpt`
The number of change-points detected.	`fit`
A numeric vector with the estimated signal.

Details

The data points provided in xd are assumed to follow $$X_t = f_t + \sigma\epsilon_t; t = 1,2,...,T,$$ where $T$ is the total length of the data sequence, $X_t$ are the observed data, $f_t$ is a one-dimensional, deterministic signal with abrupt structural changes at certain points, and $\epsilon_t$ are independent and identically distributed random variables with mean zero and variance one. In this function, the following scenarios for $f_t$ are implemented.

Piecewise-constant signal with Gaussian noise.
Use contrast = ``mean'' and ht = FALSE here.
Piecewise-constant signal with heavy-tailed noise.
Use contrast = ``mean'' and ht = TRUE here.
Continuous, piecewise-linear signal with Gaussian noise.
Use contrast = ``slope'' and ht = FALSE here.
Continuous, piecewise-linear signal with heavy-tailed noise.
Use contrast = ``slope'' and ht = TRUE here.

In the case where ht = FALSE: the function firstly detects the change-points using win_pcm_th (for the case of piecewise-constant signal) or win_cplm_th (for the case of continuous, piecewise-linear signal). If the estimated number of change-points is greater than 100, then the result is returned and we stop. Otherwise, ID proceeds to detect the change-points using pcm_ic (for the case of piecewise-constant signal) or cplm_ic (for the case of continuous, piecewise-linear signal) and this is what is returned. In the case where ht = TRUE: First we pre-average the given data sequence using normalise and then, on the obtained data sequence, we follow exactly the same procedure as the one when ht = FALSE above. More details can be found in ``Detecting multiple generalized change-points by isolating single ones'', Anastasiou and Fryzlewicz (2018), preprint.

Examples

Run this code

# NOT RUN {
single.cpt.mean <- c(rep(4,3000),rep(0,3000))
single.cpt.mean.normal <- single.cpt.mean + rnorm(6000)
single.cpt.mean.student <- single.cpt.mean + rt(6000, df = 5)
cpt.single.mean.normal <- ID(single.cpt.mean.normal)
cpt.single.mean.student <- ID(single.cpt.mean.student, ht = TRUE)

single.cpt.slope <- c(seq(0, 1999, 1), seq(1998, -1, -1))
single.cpt.slope.normal <- single.cpt.slope + rnorm(4000)
single.cpt.slope.student <- single.cpt.slope + rt(4000, df = 5)
cpt.single.slope.normal <- ID(single.cpt.slope.normal, contrast = "slope")
cpt.single.slope.student <- ID(single.cpt.slope.student, contrast = "slope", ht = TRUE)
# }

Run the code above in your browser using DataLab