foster.test: Foster-Stuart and Diersen-Trenkler Tests

Description

Performs Foster-Stuart, Diersen-Trenkler and Cebrián-Castillo-Asín records tests for trend in location, variation or the tails. The hypothesis of the classical record model (i.e., of IID continuous RVs) is tested against the alternative hypothesis.

Usage

foster.test(
  X,
  weights = function(t) 1,
  statistic = c("D", "d", "S", "s", "U", "L", "W"),
  distribution = c("normal", "t"),
  alternative = c("greater", "less"),
  correct = FALSE,
  permutation.test = FALSE,
  simulate.p.value = FALSE,
  B = 1000
)

Value

A "htest" object with elements:

statistic: Value of the test statistic.
parameter: (If distribution = "t") Degrees of freedom of the $t$ statistic (equal to $M-1$).
p.value: P-value.
alternative: The alternative hypothesis.
estimate: (If distribution = "normal") A vector with the value of the statistic, $\mu$ and $\sigma^2$. $\sigma^2$ is NA if statistic is one of "D", "S" or "W" (with the exception of "D" without weights); the p-value is computed with permutations or Monte Carlo simulations; and $T > 500$.
method: A character string indicating the type of test performed.
data.name: A character string giving the name of the data.

Arguments

X: A numeric vector, matrix (or data frame).
weights: A function indicating the weight given to the different records according to their position in the series, e.g., if function(t) t - 1 then $\omega_t = t - 1$.
statistic: A character string indicating the type of statistic to be calculated, i.e., one of "D", "d", "S", "s", "U", "L" or "W" (see Details).
distribution: A character string indicating the asymptotic distribution of the statistic, "normal" or Student's "t" distribution.
alternative: A character string indicating the type of alternative hypothesis, "greater" number of records or "less" number of records.
correct: Logical. Indicates, whether a continuity correction should be done; defaults to FALSE. No correction is done if simulate.p.value = TRUE.
permutation.test: Logical. Indicates whether to compute p-values by permutation simulation (Castillo-Mateo et al. 2023). It does not require that the columns of X be independent. If TRUE and simulate.p.value = TRUE, permutations take precedence and permutations are performed.
simulate.p.value: Logical. Indicates whether to compute p-values by Monte Carlo simulation. If permutation.test = TRUE, permutations take precedence and permutations are performed.
B: If permutation.test = TRUE or simulate.p.value = TRUE, an integer specifying the number of replicates used in the permutation or Monte Carlo estimation.

Author

Jorge Castillo-Mateo

Details

In this function, the tests are implemented as given by Foster and Stuart (1954), Diersen and Trenkler (1996, 2001) and some modifications in the standardisation of the previous statistics given by Cebrián, Castillo-Mateo and Asín (2022). The null hypothesis is that the data come from a population with independent and identically distributed realisations. The one-sided alternative hypothesis is that the chosen statistic is greater (or less) than under the null hypothesis. The different statistics are calculated according to:

If statistic == "d", $$\sum_{m=1}^{M} \sum_{t=1}^{T} \omega_t \left( I_{tm}^{(FU)} - I_{tm}^{(FL)}\right).$$

If statistic == "D", $$\sum_{m=1}^{M} \sum_{t=1}^{T} \omega_t \left( I_{tm}^{(FU)} - I_{tm}^{(FL)} - I_{tm}^{(BU)} + I_{tm}^{(BL)}\right).$$

If statistic == "s", $$\sum_{m=1}^{M} \sum_{t=1}^{T} \omega_t \left( I_{tm}^{(FU)} + I_{tm}^{(FL)}\right).$$

If statistic == "S", $$\sum_{m=1}^{M} \sum_{t=1}^{T} \omega_t \left( I_{tm}^{(FU)} + I_{tm}^{(FL)} - I_{tm}^{(BU)} - I_{tm}^{(BL)}\right).$$

If statistic == "U", $$\sum_{m=1}^{M} \sum_{t=1}^{T} \omega_t \left( I_{tm}^{(FU)} - I_{tm}^{(BU)}\right).$$

If statistic == "L", $$\sum_{m=1}^{M} \sum_{t=1}^{T} \omega_t \left( I_{tm}^{(BL)} - I_{tm}^{(FL)}\right).$$

If statistic == "W", $$\sum_{m=1}^{M} \sum_{t=1}^{T} \omega_t \left( I_{tm}^{(FU)} + I_{tm}^{(BL)}\right).$$

Where $\omega_t$ are weights given to the different records according to their position in the series, $I_{tm}$ are the record indicators (see I.record), and $(FU)$, $(FL)$, $(BU)$, and $(BL)$ represent forward upper, forward lower, backward upper and backward lower records, respectively. The statistics $d$, $D$ and $W$ may be used for trend in location; $s$ and $S$ may be used for trend in variation; and $U$ and $L$ may be used for trend in the upper and lower tails of the distribution respectively.

The statistics, say $X$, are approximately normally distributed, with $$Z = \frac{X - \mu}{\sigma},$$ while the mean $\mu$ of the particular statistic considered is simple to calculate, its variance $\sigma^2$ become a cumbersome expression and some are given by Diersen and Trenkler (2001) and all of them can be easily computed out of the expression of the covariances given by Cebrián, Castillo-Mateo and Asín (2022).

If correct = TRUE, then a continuity correction will be employed: $$Z = \frac{X \pm 0.5 - \mu}{\sigma},$$ with ``$-$'' if the alternative is greater and ``$+$'' if the alternative is less. Not recommended for the statistics with $\mu=0$.

When $M>1$, the expression of the variance under the null hypothesis can be substituted by the sample variance in the $M$ series, $\hat{\sigma}^2$. In this case, the statistics are asymptotically $t$ distributed, which is a more robust alternative against serial correlation.

If permutation.test = TRUE, the p-value is estimated by permutation simulations. This is the only method of calculating p-values that does not require that the columns of X be independent.

If simulate.p.value = TRUE, the p-value is estimated by Monte Carlo simulations. If the normal asymptotic statistic "D", "S" or "W" is used when the length of the series $T$ is greater than 1000 or 1500, permutations or this approach are preferable due to the computational cost of calculating the variance of the statistic under the null hypothesis. The exception is "D" without weights, which has an alternative algorithm implemented to calculate the variance quickly.

References

Castillo-Mateo J, Cebrián AC, Asín J (2023). “Statistical Analysis of Extreme and Record-Breaking Daily Maximum Temperatures in Peninsular Spain during 1960--2021.” Atmospheric Research, 293, 106934. tools:::Rd_expr_doi("10.1016/j.atmosres.2023.106934").

Cebrián AC, Castillo-Mateo J, Asín J (2022). “Record Tests to Detect Non Stationarity in the Tails with an Application to Climate Change.” Stochastic Environmental Research and Risk Assessment, 36(2), 313-330. tools:::Rd_expr_doi("10.1007/s00477-021-02122-w").

Diersen J, Trenkler G (1996). “Records Tests for Trend in Location.” Statistics, 28(1), 1-12. tools:::Rd_expr_doi("10.1080/02331889708802543").

Diersen J, Trenkler G (2001). “Weighted Records Tests for Splitted Series of Observations.” In J Kunert, G Trenkler (eds.), Mathematical Statistics with Applications in Biometry: Festschrift in Honour of Prof. Dr. Siegfried Schach, pp. 163–178. Lohmar: Josef Eul Verlag.

Foster FG, Stuart A (1954). “Distribution-Free Tests in Time-Series Based on the Breaking of Records.” Journal of the Royal Statistical Society B, 16(1), 1-22. tools:::Rd_expr_doi("10.1111/j.2517-6161.1954.tb00143.x").

Examples

Run this code

# D-statistic
foster.test(ZaragozaSeries)
# D-statistic with linear weights
foster.test(ZaragozaSeries, weights = function(t) t - 1)
# S-statistic with linear weights
foster.test(ZaragozaSeries, statistic = "S", weights = function(t) t - 1)
# D-statistic with weights and t approach
foster.test(ZaragozaSeries, distribution = "t", weights = function(t) t - 1)
# U-statistic with weights (upper tail)
foster.test(ZaragozaSeries, statistic = "U", weights = function(t) t - 1)
# L-statistic with weights (lower tail)
foster.test(ZaragozaSeries, statistic = "L", weights = function(t) t - 1)

Run the code above in your browser using DataLab