cpTestFn: Test for change-point detection based on the multivariate empirical distribution function

Description

Nonparametric test for change-point detection based on the (multivariate) empirical distribution function. The observations can be continuous univariate or multivariate, and serially independent or dependent (strongly mixing). Approximate p-values for the test statistics are obtained by means of a multiplier approach. Details can be found in the first reference which treats the serially independent case.

Usage

cpTestFn(x, statistic = c("cvmmax", "cvmmean", "ksmax", "ksmean"),
         method = c("nonseq", "seq"), b = 1,
         weights = c("parzen", "bartlett"),
         m = 5, L.method=c("max","median","mean","min"),
         N = 1000, init.seq = NULL)

Arguments

a data matrix whose rows are continuous observations.

statistic

a string specifying the statistic whose value and p-value will be displayed; can be either "cvmmax" or "cvmmean" (the maximum or average of the nrow(x)-1 intermediate Cramé{e}r-von Mises statis

method

a string specifying the simulation method for generating multiplier replicates of the test statistic; can be either "nonseq" (the 'check' approach in the first reference) or "seq" (the 'hat' approach in the first

strictly positive integer specifying the value of the bandwidth parameter determining the serial dependence when generating dependent multiplier sequences using the 'moving average approach'; see Section 6.1 of the second reference. The defaul

weights

a string specifying the kernel for creating the weights used in the generation of dependent multiplier sequences within the 'moving average approach'; see Section 6.1 of the second reference.

a strictly positive integer specifying the number of points of the uniform grid on $(0,1)^d$ (where $d$ is ncol(x)) involved in the estimation of the bandwidth parameter; see Section 5 of the third reference. The number of points

L.method

a string specifying how the parameter $L$ involved in the estimation of the bandwidth parameter is computed; see Section 5 of the second reference.

number of multiplier replications.

init.seq

a sequence of independent standard normal variates of length N * (nrow(x) + 2 * (b - 1)) used to generate dependent multiplier sequences.

Value

An object of class htest which is a list, some of the components of which are
statisticvalue of the test statistic.
p.valuecorresponding approximate p-value.
cvmthe values of the nrow(x)-1 intermediate Cramé{e}r-von Mises change-point statistics.
ksthe values of the nrow(x)-1 intermediate Kolmogorov-Smirnov change-point statistics.
all.statisticsthe values of all four test statistics.
all.p.valuesthe corresponding p-values.
bthe value of parameter b.

Details

The approximate p-value is computed as $$(0.5 +\sum_{i=1}^N\mathbf{1}_{{S_i\ge S}})/(N+1),$$ where $S$ and $S_i$ denote the test statistic and a multiplier replication, respectively. This ensures that the approximate p-value is a number strictly between 0 and 1, which is sometimes necessary for further treatments.

References

M. Holmes, I. Kojadinovic and J-F. Quessy (2013), Nonparametric tests for change-point detection à la Gombay and Horváth, Journal of Multivariate Analysis 115, pages 16-32

A. Bü{u}cher and I. Kojadinovic (2014), A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing, http://arxiv.org/abs/1306.3930.

Examples

Run this code

## a univariate example
n <- 100
k <- 50 ## the true change-point
y <- rnorm(k)
z <- rexp(n-k)
x <- matrix(c(y,z))
cp <- cpTestFn(x)
cp

## all statistics
cp$all.statistics
## corresponding p.values
cp$all.p.values

## estimated change-point
which(cp$cvm == max(cp$cvm))
which(cp$ks == max(cp$ks))

## a very artificial trivariate example
## with a break in the first margin
n <- 100
k <- 50 ## the true change-point
y <- rnorm(k)
z <- rnorm(n-k,mean=2)
x <- cbind(c(y,z),matrix(rnorm(2*n),n,2))
cp <- cpTestFn(x)
cp

## all statistics
cp$all.statistics
## corresponding p.values
cp$all.p.values

## estimated change-point
which(cp$cvm == max(cp$cvm))
which(cp$ks == max(cp$ks))

Run the code above in your browser using DataLab