Learn R Programming

npcp (version 0.1-0)

cpTestFn: Test for change-point detection based on the multivariate empirical distribution function

Description

Nonparametric test for change-point detection based on the (multivariate) empirical distribution function. The observations can be continuous univariate or multivariate, and serially independent or dependent (strongly mixing). Approximate p-values for the test statistics are obtained by means of a multiplier approach. Details can be found in the first reference which treats the serially independent case.

Usage

cpTestFn(x, statistic = c("cvmmax", "cvmmean", "ksmax", "ksmean"),
         method = c("nonseq", "seq"), b = 1,
         weights = c("parzen", "bartlett"),
         m = 5, L.method=c("max","median","mean","min"),
         N = 1000, init.seq = NULL)

Arguments

x
a data matrix whose rows are continuous observations.
statistic
a string specifying the statistic whose value and p-value will be displayed; can be either "cvmmax" or "cvmmean" (the maximum or average of the nrow(x)-1 intermediate Cramé{e}r-von Mises statis
method
a string specifying the simulation method for generating multiplier replicates of the test statistic; can be either "nonseq" (the 'check' approach in the first reference) or "seq" (the 'hat' approach in the first
b
strictly positive integer specifying the value of the bandwidth parameter determining the serial dependence when generating dependent multiplier sequences using the 'moving average approach'; see Section 6.1 of the second reference. The defaul
weights
a string specifying the kernel for creating the weights used in the generation of dependent multiplier sequences within the 'moving average approach'; see Section 6.1 of the second reference.
m
a strictly positive integer specifying the number of points of the uniform grid on $(0,1)^d$ (where $d$ is ncol(x)) involved in the estimation of the bandwidth parameter; see Section 5 of the third reference. The number of points
L.method
a string specifying how the parameter $L$ involved in the estimation of the bandwidth parameter is computed; see Section 5 of the second reference.
N
number of multiplier replications.
init.seq
a sequence of independent standard normal variates of length N * (nrow(x) + 2 * (b - 1)) used to generate dependent multiplier sequences.

Value

  • An object of class htest which is a list, some of the components of which are
  • statisticvalue of the test statistic.
  • p.valuecorresponding approximate p-value.
  • cvmthe values of the nrow(x)-1 intermediate Cramé{e}r-von Mises change-point statistics.
  • ksthe values of the nrow(x)-1 intermediate Kolmogorov-Smirnov change-point statistics.
  • all.statisticsthe values of all four test statistics.
  • all.p.valuesthe corresponding p-values.
  • bthe value of parameter b.

Details

The approximate p-value is computed as $$(0.5 +\sum_{i=1}^N\mathbf{1}_{{S_i\ge S}})/(N+1),$$ where $S$ and $S_i$ denote the test statistic and a multiplier replication, respectively. This ensures that the approximate p-value is a number strictly between 0 and 1, which is sometimes necessary for further treatments.

References

M. Holmes, I. Kojadinovic and J-F. Quessy (2013), Nonparametric tests for change-point detection à la Gombay and Horváth, Journal of Multivariate Analysis 115, pages 16-32

A. Bü{u}cher and I. Kojadinovic (2014), A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing, http://arxiv.org/abs/1306.3930.

See Also

cpTestCn() for a related test based on the empirical copula, cpTestRho() for a related test based on Spearman's rho, bOptEmpProc() for the function used to estimate b from x if b = NULL.

Examples

Run this code
## a univariate example
n <- 100
k <- 50 ## the true change-point
y <- rnorm(k)
z <- rexp(n-k)
x <- matrix(c(y,z))
cp <- cpTestFn(x)
cp

## all statistics
cp$all.statistics
## corresponding p.values
cp$all.p.values

## estimated change-point
which(cp$cvm == max(cp$cvm))
which(cp$ks == max(cp$ks))

## a very artificial trivariate example
## with a break in the first margin
n <- 100
k <- 50 ## the true change-point
y <- rnorm(k)
z <- rnorm(n-k,mean=2)
x <- cbind(c(y,z),matrix(rnorm(2*n),n,2))
cp <- cpTestFn(x)
cp

## all statistics
cp$all.statistics
## corresponding p.values
cp$all.p.values

## estimated change-point
which(cp$cvm == max(cp$cvm))
which(cp$ks == max(cp$ks))

Run the code above in your browser using DataLab