p.regression.test: Probabilities of Record Regression Test

Description

This function performs a linear hypothesis test based on a regression for the record probabilities $p_t$ to study the hypothesis of the classical record model.

Usage

p.regression.test(
  X,
  record = c("upper", "lower"),
  formula = y ~ x,
  simulate.p.value = FALSE,
  B = 1000
)

Arguments

A numeric vector, matrix (or data frame).

record

A character string indicating the type of records to be calculated, "upper" or "lower".

formula

"formula" to use in lm function, e.g., y ~ x, y ~ poly(x, 2, raw = TRUE), y ~ log(x). By default formula = y ~ x. See Note for a caveat.

simulate.p.value

Logical. Indicates whether to compute p-values by Monte Carlo simulation. It is recommended if the number of columns of X (i.e., the number of series) is lower than 12, since for lower values the size of the test is not fulfilled.

If simulate.p.value = TRUE, an integer specifying the number of replicates used in the Monte Carlo estimation.

Value

A "htest" object with elements:

null.value

Value of the coefficients under the null hypothesis when more than one coefficient is fitted.

alternative

Character string indicating the type of alternative hypothesis.

method

A character string indicating the type of test performed.

estimate

Value of the fitted coefficients.

data.name

A character string giving the name of the data.

statistic

Value of the $F$ statistic.

parameters

Degrees of freedom of the $F$ statistic.

p.value

P-value.

Details

The null hypothesis is that the data come from a population with independent and identically distributed realizations. This implies that in all the vectors (columns in matrix X), the sample probability of record at time $t$ (p.record) is $1/t$, so that $$t \, \textrm{E}(\hat p_t) = 1.$$ Then, $$H_0:\,p_t = 1/t, \, t=2, ..., T \iff H_0:\,\beta_0 = 1, \, \beta_1 = 0,$$ where $\beta_0$ and $\beta_1$ are the coefficients of the regression model $$t \, \textrm{E}(\hat p_t) = \beta_0 + \beta_1 t.$$ The model has to be estimated by weighted least squares since the response is heteroskedastic.

Other models can be considered with the formula argument. However, for the test to be correct, the model that assigns 1 to all responses must be nested in the bigger one, either leaving the intercept free or setting the intercept to 1 (see Examples for possible models).

The $F$ statistic is computed for carrying out an $F$-test-based comparison between the restricted model under the null hypothesis and the more general model (e.g., the alterantive hypothesis where $t \, \textrm{E}(\hat p_t)$ is a linear function of time $t$). This alternative hypothesis may be reasonable in many real examples, but not always.

If the sample size (i.e., the number of series or columns of X) is lower than 8 or 12 the distribution $F$ is not fulfilled, so the simulate.p.value option is recommended in this case.

Examples

Run this code

# NOT RUN {
# Simple test for upper records (p-value = 0.01047)
p.regression.test(ZaragozaSeries)
# Simple test for lower records (p-value = 9.178e-05)
p.regression.test(ZaragozaSeries, record = "lower")

# Fit a 2nd term polynomial for upper records (p-value = 0.01187)
p.regression.test(ZaragozaSeries, formula = y ~ I(x^2))
# Fit a 2nd term polynomial for lower records (p-value = 8.007e-05)
p.regression.test(ZaragozaSeries, record = "lower", formula = y ~ I(x^2))

# Fix the intercept to 1 for upper records (p-value = 0.005557)
p.regression.test(ZaragozaSeries, formula = y ~ I(x-1) - 1 + offset(rep(1, length(x))))
# Fix the intercept to 1 for lower records (p-value = 2.467e-05)
p.regression.test(ZaragozaSeries, record = "lower", 
                  formula = y ~ I(x-1) - 1 + offset(rep(1, length(x))))

# Simulate p-value when the number of series is small
TxZ <- apply(series_split(TX_Zaragoza$TX), 1, max)
p.regression.test(TxZ, simulate.p.value = TRUE)
# }