Performs change-point detection tests based on the record occurrence. The hypothesis of the classical record model (i.e., of IID continuous RVs) is tested against the alternative hypothesis that after a certain time the series stops being IID.
change.point(
X,
weights = function(t) 1,
record = c("upper", "lower", "d", "s"),
correct = c("none", "fisher", "vrbik"),
permutation.test = FALSE,
simulate.p.value = FALSE,
B = 1000
)
A "htest"
object with elements:
Value of the test statistic.
P-value.
The alternative hypothesis.
The estimated change-point time.
A character string indicating the type of test performed.
A character string giving the name of the data.
A numeric vector, matrix (or data frame).
A function indicating the weight given to the different
records according to their position in the series. Castillo-Mateo (2022)
showed that the weights that get more power for this test are
\(\omega_t = \sqrt{t^2 / (t - 1)}\), i.e.,
weights = function(t) ifelse(t == 1, 0, sqrt(t^2 / (t - 1)))
if
record = "upper"
or = "lower"
.
\(\omega_t = \sqrt{t}\), i.e., weights = function(t) sqrt(t)
if
record = "d"
and \(\omega_t = \sqrt{t^2 / (t-2)}\), i.e.,
weights = function(t) ifelse(t %in% 1:2, 0, sqrt(t^2 / (t-2)))
if
record = "s"
.
A character string that indicates the type of statistic used.
The statistic with "upper"
or = "lower"
records, or the
statistic with the difference, "d"
, or the sum, "s"
, of
upper and lower records.
A character string that indicates the continuity correction
in the Kolmogorov distribution made to the statistic: "fisher" (Fisher and
Robbins 2019), "vrbik" (Vrbik 2020) or "none" (the default) if no
correction is made. The former shows better size and power, but if the
value of the statistic is too large it becomes NaN
and p-value
NA
.
Logical. Indicates whether to compute p-values by
permutation simulation (Castillo-Mateo et al. 2023). It does not require
that the columns of X
be independent. If TRUE
and
simulate.p.value = TRUE
, permutations take precedence and
permutations are performed.
Logical. Indicates whether to compute p-values by
Monte Carlo simulation. If permutation.test = TRUE
, permutations
take precedence and permutations are performed.
If permutation.test = TRUE
or simulate.p.value = TRUE
,
an integer specifying the number of replicates used in the permutation or
Monte Carlo estimation.
Jorge Castillo-Mateo
The test is implemented as given by Castillo-Mateo (2022). The null hypothesis is that $$H_0: p_t = 1/t, \qquad t=1,\ldots,T,$$ where \(p_t\) is the probability of (upper and/or lower) record at time \(t\). The two-sided alternative hypothesis is that $$H_1: p_t = 1/t, \quad t=1,\ldots,t_0, \qquad p_t \neq 1/t, \quad t=t_0+1,\ldots,T,$$ for a change-point \(t_0\).
The variables used for the statistic are $$K^{\omega}_T = \max_{1\le t \le T} \left| \frac{N_{t}^{\omega} - \textrm{E}(N_{t}^{\omega})}{\sqrt{\textrm{VAR}(N_{T}^{\omega})}} - \frac{\textrm{VAR}(N_{t}^{\omega})}{\textrm{VAR}(N_{T}^{\omega})} \frac{N_{T}^{\omega} - \textrm{E}(N_{T}^{\omega})}{\sqrt{\textrm{VAR}(N_{T}^{\omega})}} \right|,$$ where \(N_{t}^\omega = \sum_{m=1}^M \sum_{j=1}^t \omega_j I_{jm}\), and the estimated change-point \(\hat{t}_0\) is the value \(t\) where \(K^{\omega}_T\) attains its maximum.
Argument record
indicates if the \(I_{tm}\)'s are the
"upper"
or "lower"
record indicators (see
I.record
). If record = "d"
or = "s"
,
\(N_{t}^\omega\) is substituted in the expressions above by
\(d_{t}^{\omega,(F)} = N_{t}^{\omega,(FU)} - N_{t}^{\omega,(FL)}\) or
\(s_{t}^{\omega,(F)} = N_{t}^{\omega,(FU)} + N_{t}^{\omega,(FL)}\),
respectively.
The p-value is calculated by means of the asymptotic Kolmogorov
distribution. When \(\omega_t \neq 1\), the asymptotic result is not
fulfilled. In that case, the p-value should be simulated using
permutation or Monte Carlo simulations with the option
permutation.test = TRUE
or simulate.p.value = TRUE
,
respectively. Permutations is the only method of calculating p-values that
does not require that the columns of X
be independent.
As the Kolmogorov distribution is an asymptotic result, it has been seen that the size and power may be a little below than expected, to correct this, any of the continuity corrections can be used:
If correct = "fisher"
,
$$K_T = - \sqrt{T} \log\left(1 - \frac{K_T}{\sqrt{T}}\right).$$
If correct = "vrbik"
,
$$K_T = K_T + \frac{1}{6\sqrt{T}} + \frac{K_T - 1}{4T}.$$
Castillo-Mateo J (2022). “Distribution-Free Changepoint Detection Tests Based on the Breaking of Records.” Environmental and Ecological Statistics, 29(3), 655-676. tools:::Rd_expr_doi("10.1007/s10651-022-00539-2").
Castillo-Mateo J, Cebrián AC, Asín J (2023). “Statistical Analysis of Extreme and Record-Breaking Daily Maximum Temperatures in Peninsular Spain during 1960--2021.” Atmospheric Research, 293, 106934. tools:::Rd_expr_doi("10.1016/j.atmosres.2023.106934").
Fisher TJ, Robbins MW (2019). “A Cheap Trick to Improve the Power of a Conservative Hypothesis Test.” The American Statistician, 73(3), 232-242. tools:::Rd_expr_doi("10.1080/00031305.2017.1395364").
Vrbik J (2020). “Deriving CDF of Kolmogorov-Smirnov Test Statistic.” Applied Mathematics, 11(3), 227-246. tools:::Rd_expr_doi("10.4236/am.2020.113018").
records
, foster.test
change.point(ZaragozaSeries)
change.point(series_split(TX_Zaragoza$TX), record = "d",
weights = function(t) sqrt(t),
permutation.test = TRUE, B = 50)
change.point(ZaragozaSeries, record = "d",
weights = function(t) sqrt(t), simulate.p.value = TRUE)
test.result <- change.point(rowMeans(ZaragozaSeries))
test.result
## Not run: Load package ggplot2 to plot the changepoint
#library("ggplot2")
#records(rowMeans(ZaragozaSeries)) +
# ggplot2::geom_vline(xintercept = test.result$estimate, colour = "red")
Run the code above in your browser using DataLab