N.test: Number of Records Test

Description

Performs tests based on the (weighted) number of records, $N^\omega$. The hypothesis of the classical record model (i.e., of IID continuous RVs) is tested against the alternative hypothesis.

Usage

N.test(
  X,
  weights = function(t) 1,
  record = c("upper", "lower"),
  distribution = c("normal", "t", "poisson-binomial"),
  alternative = c("greater", "less"),
  correct = TRUE,
  method = c("mixed", "dft", "butler"),
  permutation.test = FALSE,
  simulate.p.value = FALSE,
  B = 1000
)

Value

A "htest" object with elements:

statistic: Value of the test statistic.
parameter: (If distribution = "t".) Degrees of freedom of the $t$ statistic (equal to $M-1$).
p.value: P-value.
alternative: The alternative hypothesis.
estimate: (If distribution = "normal") A vector with the value of $N^\omega$, $\mu$ and $\sigma^2$.
method: A character string indicating the type of test performed.
data.name: A character string giving the name of the data.

Arguments

X: A numeric vector, matrix (or data frame).
weights: A function indicating the weight given to the different records according to their position in the series, e.g., if function(t) t - 1 then $\omega_t = t - 1$.
record: A character string indicating the type of record to be calculated, "upper" or "lower".
distribution: A character string indicating the asymptotic distribution of the statistic, "normal" distribution, Student's "t"-distribution or exact "poisson-binomial" distribution.
alternative: A character string indicating the type of alternative hypothesis, "greater" number of records or "less" number of records.
correct: Logical. Indicates, whether a continuity correction should be done; defaults to TRUE. No correction is done if permutation.test = TRUE, simulate.p.value = TRUE or distribution = "poisson-binomial".
method: (If distribution = "poisson-binomial".) A character string that indicates the method by which the cdf of the Poisson binomial distribution is calculated and therefore the p-value. "mixed" is the preferred (and default) method, it is a more efficient combination of the later algorithms. "dft" uses the discrete Fourier transform which algorithm is given in Hong (2013). "butler" use the algorithm given by Butler and Stephens (2016).
permutation.test: Logical. Indicates whether to compute p-values by permutation simulation (Castillo-Mateo et al. 2023). It does not require that the columns of X be independent. If TRUE and simulate.p.value = TRUE, permutations take precedence and permutations are performed. No simulation is done if distribution = "poisson-binomial".
simulate.p.value: Logical. Indicates whether to compute p-values by Monte Carlo simulation. If permutation.test = TRUE, permutations take precedence and permutations are performed. No simulation is done if distribution = "poisson-binomial".
B: If permutation.test = TRUE or simulate.p.value = TRUE, an integer specifying the number of replicates used in the permutation or Monte Carlo estimation.

Author

Jorge Castillo-Mateo

Details

The null hypothesis is that the data come from a population with independent and identically distributed continuous realisations. The one-sided alternative hypothesis is that the (weighted) number of records is greater (or less) than under the null hypothesis. The (weighted)-number-of-records statistic is calculated according to: $$N^\omega = \sum_{m=1}^M \sum_{t=1}^T \omega_t I_{tm},$$ where $\omega_t$ are weights given to the different records according to their position in the series and $I_{tm}$ are the record indicators (see I.record).

The statistic $N^\omega$ is exact Poisson binomial distributed when the $\omega_t$'s only take values in $\{0,1\}$. In any case, it is also approximately normally distributed, with $$Z = \frac{N^\omega - \mu}{\sigma},$$ where its mean and variance are $$\mu = M \sum_{t=1}^T \omega_t \frac{1}{t},$$ $$\sigma^2 = M \sum_{t=2}^T \omega_t^2 \frac{1}{t} \left(1-\frac{1}{t}\right).$$

If correct = TRUE, then a continuity correction will be employed: $$Z = \frac{N^\omega \pm 0.5 - \mu}{\sigma},$$ with ``$-$'' if the alternative is greater and ``$+$'' if the alternative is less.

When $M>1$, the expression of the variance under the null hypothesis can be substituted by the sample variance in the $M$ series, $\hat{\sigma}^2$. In this case, the statistic $N_{S}^\omega$ is asymptotically $t$ distributed, which is a more robust alternative against serial correlation.

If permutation.test = TRUE, the p-value is estimated by permutation simulations. This is the only method of calculating p-values that does not require that the columns of X be independent.

If simulate.p.value = TRUE, the p-value is estimated by Monte Carlo simulations.

The size of the tests is adequate for any values of $T$ and $M$. Some comments and a power study are given by Cebrián, Castillo-Mateo and Asín (2022).

References

Butler K, Stephens MA (2017). “The Distribution of a Sum of Independent Binomial Random Variables.” Methodology and Computing in Applied Probability, 19(2), 557-571. tools:::Rd_expr_doi("10.1007/s11009-016-9533-4").

Castillo-Mateo J, Cebrián AC, Asín J (2023). “Statistical Analysis of Extreme and Record-Breaking Daily Maximum Temperatures in Peninsular Spain during 1960--2021.” Atmospheric Research, 293, 106934. tools:::Rd_expr_doi("10.1016/j.atmosres.2023.106934").

Cebrián AC, Castillo-Mateo J, Asín J (2022). “Record Tests to Detect Non Stationarity in the Tails with an Application to Climate Change.” Stochastic Environmental Research and Risk Assessment, 36(2): 313-330. tools:::Rd_expr_doi("10.1007/s00477-021-02122-w").

Hong Y (2013). “On Computing the Distribution Function for the Poisson Binomial Distribution.” Computational Statistics & Data Analysis, 59(1), 41-51. tools:::Rd_expr_doi("10.1016/j.csda.2012.10.006").

Examples

Run this code

# Forward Upper records
N.test(ZaragozaSeries)
# Forward Lower records
N.test(ZaragozaSeries, record = "lower", alternative = "less")
# Forward Upper records
N.test(series_rev(ZaragozaSeries), alternative = "less")
# Forward Upper records
N.test(series_rev(ZaragozaSeries), record = "lower")

# Exact test
N.test(ZaragozaSeries, distribution = "poisson-binom")
# Exact test for records in the last decade
N.test(ZaragozaSeries, weights = function(t) ifelse(t < 61, 0, 1), distribution = "poisson-binom")
# Linear weights for a more powerful test (without continuity correction)
N.test(ZaragozaSeries, weights = function(t) t - 1, correct = FALSE)

Run the code above in your browser using DataLab