change.point: Change-point Detection Tests Based on Records

Description

Performs change-point detection tests based on the record occurrence. The hypothesis of the classical record model (i.e., of IID continuous RVs) is tested against the alternative hypothesis that after a certain time the series stops being IID.

Usage

change.point(
  X,
  weights = function(t) 1,
  record = c("upper", "lower", "d", "s"),
  correct = c("none", "fisher", "vrbik"),
  permutation.test = FALSE,
  simulate.p.value = FALSE,
  B = 1000
)

Value

A "htest" object with elements:

statistic: Value of the test statistic.
p.value: P-value.
alternative: The alternative hypothesis.
estimate: The estimated change-point time.
method: A character string indicating the type of test performed.
data.name: A character string giving the name of the data.

Arguments

X: A numeric vector, matrix (or data frame).
weights: A function indicating the weight given to the different records according to their position in the series. Castillo-Mateo (2022) showed that the weights that get more power for this test are $\omega_t = \sqrt{t^2 / (t - 1)}$, i.e., weights = function(t) ifelse(t == 1, 0, sqrt(t^2 / (t - 1))) if record = "upper" or = "lower". $\omega_t = \sqrt{t}$, i.e., weights = function(t) sqrt(t) if record = "d" and $\omega_t = \sqrt{t^2 / (t-2)}$, i.e., weights = function(t) ifelse(t %in% 1:2, 0, sqrt(t^2 / (t-2))) if record = "s".
record: A character string that indicates the type of statistic used. The statistic with "upper" or = "lower" records, or the statistic with the difference, "d", or the sum, "s", of upper and lower records.
correct: A character string that indicates the continuity correction in the Kolmogorov distribution made to the statistic: "fisher" (Fisher and Robbins 2019), "vrbik" (Vrbik 2020) or "none" (the default) if no correction is made. The former shows better size and power, but if the value of the statistic is too large it becomes NaN and p-value NA.
permutation.test: Logical. Indicates whether to compute p-values by permutation simulation (Castillo-Mateo et al. 2023). It does not require that the columns of X be independent. If TRUE and simulate.p.value = TRUE, permutations take precedence and permutations are performed.
simulate.p.value: Logical. Indicates whether to compute p-values by Monte Carlo simulation. If permutation.test = TRUE, permutations take precedence and permutations are performed.
B: If permutation.test = TRUE or simulate.p.value = TRUE, an integer specifying the number of replicates used in the permutation or Monte Carlo estimation.

Author

Jorge Castillo-Mateo

Details

The test is implemented as given by Castillo-Mateo (2022). The null hypothesis is that $$H_0: p_t = 1/t, \qquad t=1,\ldots,T,$$ where $p_t$ is the probability of (upper and/or lower) record at time $t$. The two-sided alternative hypothesis is that $$H_1: p_t = 1/t, \quad t=1,\ldots,t_0, \qquad p_t \neq 1/t, \quad t=t_0+1,\ldots,T,$$ for a change-point $t_0$.

The variables used for the statistic are $$K^{\omega}_T = \max_{1\le t \le T} \left| \frac{N_{t}^{\omega} - \textrm{E}(N_{t}^{\omega})}{\sqrt{\textrm{VAR}(N_{T}^{\omega})}} - \frac{\textrm{VAR}(N_{t}^{\omega})}{\textrm{VAR}(N_{T}^{\omega})} \frac{N_{T}^{\omega} - \textrm{E}(N_{T}^{\omega})}{\sqrt{\textrm{VAR}(N_{T}^{\omega})}} \right|,$$ where $N_{t}^\omega = \sum_{m=1}^M \sum_{j=1}^t \omega_j I_{jm}$, and the estimated change-point $\hat{t}_0$ is the value $t$ where $K^{\omega}_T$ attains its maximum.

Argument record indicates if the $I_{tm}$'s are the "upper" or "lower" record indicators (see I.record). If record = "d" or = "s", $N_{t}^\omega$ is substituted in the expressions above by $d_{t}^{\omega,(F)} = N_{t}^{\omega,(FU)} - N_{t}^{\omega,(FL)}$ or $s_{t}^{\omega,(F)} = N_{t}^{\omega,(FU)} + N_{t}^{\omega,(FL)}$, respectively.

The p-value is calculated by means of the asymptotic Kolmogorov distribution. When $\omega_t \neq 1$, the asymptotic result is not fulfilled. In that case, the p-value should be simulated using permutation or Monte Carlo simulations with the option permutation.test = TRUE or simulate.p.value = TRUE, respectively. Permutations is the only method of calculating p-values that does not require that the columns of X be independent.

As the Kolmogorov distribution is an asymptotic result, it has been seen that the size and power may be a little below than expected, to correct this, any of the continuity corrections can be used:

If correct = "fisher", $$K_T = - \sqrt{T} \log\left(1 - \frac{K_T}{\sqrt{T}}\right).$$

If correct = "vrbik", $$K_T = K_T + \frac{1}{6\sqrt{T}} + \frac{K_T - 1}{4T}.$$

References

Castillo-Mateo J (2022). “Distribution-Free Changepoint Detection Tests Based on the Breaking of Records.” Environmental and Ecological Statistics, 29(3), 655-676. tools:::Rd_expr_doi("10.1007/s10651-022-00539-2").

Castillo-Mateo J, Cebrián AC, Asín J (2023). “Statistical Analysis of Extreme and Record-Breaking Daily Maximum Temperatures in Peninsular Spain during 1960--2021.” Atmospheric Research, 293, 106934. tools:::Rd_expr_doi("10.1016/j.atmosres.2023.106934").

Fisher TJ, Robbins MW (2019). “A Cheap Trick to Improve the Power of a Conservative Hypothesis Test.” The American Statistician, 73(3), 232-242. tools:::Rd_expr_doi("10.1080/00031305.2017.1395364").

Vrbik J (2020). “Deriving CDF of Kolmogorov-Smirnov Test Statistic.” Applied Mathematics, 11(3), 227-246. tools:::Rd_expr_doi("10.4236/am.2020.113018").

Examples

Run this code

change.point(ZaragozaSeries)

change.point(series_split(TX_Zaragoza$TX), record = "d", 	
  weights = function(t) sqrt(t), 
  permutation.test = TRUE, B = 50)

change.point(ZaragozaSeries, record = "d", 
  weights = function(t) sqrt(t), simulate.p.value = TRUE)

test.result <- change.point(rowMeans(ZaragozaSeries))
test.result

## Not run: Load package ggplot2 to plot the changepoint
#library("ggplot2")
#records(rowMeans(ZaragozaSeries)) + 
#  ggplot2::geom_vline(xintercept = test.result$estimate, colour = "red")

Run the code above in your browser using DataLab