Learn R Programming

PLRModels (version 1.1)

np.ancova: Nonparametric analysis of covariance

Description

This routine tests the equality of $L$ nonparametric regression curves ($m_1, ..., m_L$) from samples ${(Y_{ki}, t_i): i=1,...,n}$, $k=1,...,L$, where: $$Y_{ki}= m_k(t_i) + \epsilon_{ki}.$$ The unknown functions $m_k$ are smooth, fixed equally spaced design is considered, and the random errors, $\epsilon_{ki}$, are allowed to be time series. The test statistic used for testing the null hypothesis, $H0: m_1 = ...= m_L$, derives from a Cramer-von-Mises-type functional based on different distances between nonparametric estimators of the regression functions.

Usage

np.ancova(data = data, h.seq = NULL, w = NULL, estimator = "NW", 
kernel = "quadratic", time.series = FALSE, Tau.eps = NULL, 
h0 = NULL, lag.max = 50, p.max = 3, q.max = 3, ic = "BIC", 
num.lb = 10, alpha = 0.05)

Arguments

data
data[, k] contains the values of the response variable, $Y_k$, for each model $k$ ($k=1, ..., L$); data[, L+1] contains the values of the explanatory (common) variable, $t$, for each model $k$ ($k=1, ..., L$).
h.seq
the statistic test is performed using each bandwidth in the vector h.seq (the same bandwidth is used to estimate all the regression functions). If NULL (the default), 10 equidistant values between 0 and the first half of the rang
w
support interval of the weigth function in the test statistic. If NULL (the default), $(q_{0.1}, q_{0.9})$ is considered, where $q_p$ denotes the quantile of order $p$ of ${t_i}$.
estimator
allows us the choice between NW (Nadaraya-Watson) or LLP (Local Linear Polynomial). The default is NW.
kernel
allows us the choice between gaussian, quadratic (Epanechnikov kernel), triweight or uniform kernel. The default is quadratic.
time.series
it denotes whether the data are independent (FALSE) or if data is a time series (TRUE). The default is FALSE.
Tau.eps
Tau.eps[k] contains the sum of autocovariances associated to the random errors of the regression model $k$ ($k=1, ..., L$). If NULL (the default), the function tries to estimate it: it fits an ARMA model (selected according to an information
h0
if Tau.eps=NULL, h0 contains the pilot bandwidth used for obtaining the residuals to construct the default for Tau.eps. If NULL (the default), a quarter of the range of ${t_i}$ is considered.
lag.max
if Tau.eps=NULL, lag.max contains the maximum delay used to construct the default for Tau.eps. The default is 50.
p.max
if Tau.eps=NULL, the ARMA models are selected between the models ARMA(p,q) with 0<=p<=p.max and 0<=q<=q.max. The default is 3.
q.max
if Tau.eps=NULL, the ARMA models are selected between the models ARMA(p,q) with 0<=p<=p.max and 0<=q<=q.max. The default is 3.
ic
if Tau.eps=NULL, ic contains the information criterion used to suggest the ARMA models. It allows us to choose between: "AIC", "AICC" or "BIC" (the default).
num.lb
if Tau.eps=NULL, it checks the suitability of the selected ARMA models according to the Ljung-Box test and the t-test. It uses up to num.lb delays in the Ljung-Box test. The default is 10.
alpha
if Tau.eps=NULL, alpha contains the significance level which the ARMA models are checked. The default is 0.05.

Value

  • A list with a dataframe containing:
  • h.seqsequence of bandwidths used in the test statistic.
  • Q.mvalues of the test statistic (one for each bandwidth in h.seq).
  • Q.m.normalisednormalised value of Q.m.
  • p.valuep-values of the corresponding statistic tests (one for each bandwidth in h.seq).
  • Moreover, if data is a time series and Tau.eps is not especified:
  • pv.Box.testp-values of the Ljung-Box test for the model fitted to the residuals.
  • pv.t.testp-values of the t.test for the model fitted to the residuals.
  • ar.maARMA orders for the model fitted to the residuals.

Details

A weight function (specifically, the indicator function 1$_{[w[1] , w[2]]}$) is introduced in the test statistic to allow elimination (or at least significant reduction) of boundary effects from the estimate of $m(t_i)$. If Tau.eps=NULL and the routine is not able to suggest an approximation for Tau.eps, it warns the user with a message saying that the model could be not appropriate and then it shows the results. In order to construct Tau.eps, the procedures suggested in Muller and Stadmuller (1988) and Herrmann et al. (1992) can be followed. For more details, see Vilar-Fernandez and Gonzalez-Manteiga (2004).

References

Dette, H. and Neumeyer, N. (2001) Nonparametric analysis of covariance. Ann. Statist. 29, no. 5, 1361-1400. Herrmann, E., Gasser, T. and Kneip, A. (1992) Choice of bandwidth for kernel regression when residuals are correlated. Biometrika 79, 783-795 Muller, H.G. and Stadmuller, U. (1988) Detecting dependencies in smooth regression models. Biometrika 75, 639-650 Vilar-Fernandez, J.M. and Gonzalez-Manteiga, W. (2004) Nonparametric comparison of curves with dependent errors. Statistics 38, 81-99.

See Also

Other related functions are np.est, par.ancova and plrm.ancova.

Examples

Run this code
# EXAMPLE 1: REAL DATA
data <- matrix(10,120,2)
data(barnacles1)
barnacles1 <- as.matrix(barnacles1)
data[,1] <- barnacles1[,1]
data <- diff(data, 12)
data[,2] <- 1:nrow(data)

data2 <- matrix(10,120,2)
data(barnacles2)
barnacles2 <- as.matrix(barnacles2)
data2[,1] <- barnacles2[,1]
data2 <- diff(data2, 12)
data2[,2] <- 1:nrow(data2)

data3 <- matrix(0, nrow(data),ncol(data)+1)
data3[,1] <- data[,1]
data3[,2:3] <- data2

np.ancova(data=data3)



# EXAMPLE 2: SIMULATED DATA
## Example 2.1: dependent data: true null hypothesis

set.seed(1234)
# We generate the data
n <- 100
t <- ((1:n)-0.5)/n
m1 <- function(t) {0.25*t*(1-t)}
f <- m1(t)

epsilon1 <- arima.sim(list(order = c(1,0,0), ar=0.7), sd = 0.01, n = n)
y1 <-  f + epsilon1

epsilon2 <- arima.sim(list(order = c(0,0,1), ma=0.5), sd = 0.02, n = n)
y2 <- f + epsilon2

data_eq <- cbind(y1, y2, t)

# We apply the test
np.ancova(data_eq, time.series=TRUE)


## Example 2.2: dependent data: false null hypothesis
# We generate the data
n <- 100
t <- ((1:n)-0.5)/n
m3 <- function(t) {0.25*t*(1-t)}
m4 <- function(t) {0.25*t*(1-t)*0.75}

f3 <- m3(t)
epsilon3 <- arima.sim(list(order = c(1,0,0), ar=0.7), sd = 0.01, n = n)
y3 <-  f3 + epsilon3

f4 <- m4(t)
epsilon4 <- arima.sim(list(order = c(0,0,1), ma=0.5), sd = 0.02, n = n)
y4 <-  f4 + epsilon4

data_neq<- cbind(y3, y4, t)

# We apply the test
np.ancova(data_neq, time.series=TRUE)

Run the code above in your browser using DataLab