plrm.ancova: Semiparametric analysis of covariance (based on PLR models)

Description

From samples ${(Y_{ki}, X_{ki1}, ..., X_{kip}, t_i): i=1,...,n}$, $k=1,...,L$, this routine tests the null hypotheses $H0: \beta_1 = ... = \beta_L$ and $H0: m_1 = ... = m_L$, where: $$\beta_k = (\beta_{k1},...,\beta_{kp})$$ is an unknown vector parameter; $$m_k(.)$$ is a smooth but unknown function and $$Y_{ki}= X_{ki1}*\beta_{k1} +...+ X_{kip}*\beta_{kp} + m(t_i) + \epsilon_{ki}.$$ Fixed equally spaced design is considered for the "nonparametric" explanatory variable, $t$, and the random errors, $\epsilon_{ki}$, are allowed to be time series. The test statistic used for testing $H0: \beta_1 = ...= \beta_L$ derives from the asymptotic normality of an estimator of $\beta_k$ ($k=1,...,L$) based on both ordinary least squares and kernel smoothing (this result giving a $\chi^2$-test). The test statistic used for testing $H0: m_1 = ...= m_L$ derives from a Cramer-von-Mises-type functional based on different distances between nonparametric estimators of $m_k$ ($k=1,...,L$).

Usage

plrm.ancova(data = data, t = t, b.seq = NULL, h.seq = NULL, 
w = NULL, estimator = "NW", kernel = "quadratic", 
time.series = FALSE, Var.Cov.eps = NULL, Tau.eps = NULL, 
b0 = NULL, h0 = NULL, lag.max = 50, p.max = 3, q.max = 3,
ic = "BIC", num.lb = 10, alpha = 0.05)

Arguments

data

data[, 1, k] contains the values of the response variable, $Y_k$, for each model $k$ ($k=1, ..., L$); data[, 2:(p+1), k] contains the values of the "linear" explanatory variables, $X_{k1}, ..., X_{kp}$, for each model k ($k=1,

contains the values of the "nonparametric" explanatory (common) variable, $t$, for each model $k$ ($k=1, ..., L$).

b.seq

the statistic test for $H0: \beta_1 = ... = \beta_L$ is performed using each bandwidth in the vector b.seq. If NULL (the default) but h.seq is not NULL, it takes b.seq=h.seq. If both b

h.seq

the statistic test for $H0: m_1 = ... = m_L$ is performed using each pair of bandwidths (b.seq[j], h.seq[j]). If NULL (the default) but b.seq is not NULL, it takes h.seq=b.seq. If both

support interval of the weigth function in the test statistic for $H0: m_1 = ... = m_L$. If NULL (the default), $(q_{0.1}, q_{0.9})$ is considered, where $q_p$ denotes the quantile of order $p$ of ${t_i}$.

estimator

allows us the choice between NW (Nadaraya-Watson) or LLP (Local Linear Polynomial). The default is NW.

kernel

allows us the choice between gaussian, quadratic (Epanechnikov kernel), triweight or uniform kernel. The default is quadratic.

time.series

it denotes whether the data are independent (FALSE) or if data is a time series (TRUE). The default is FALSE.

Var.Cov.eps

Var.Cov.eps[, , k] contains the n x n matrix of variances-covariances associated to the random errors of the regression model k ($k=1, ..., L$). If NULL (the default), the function tries to estimate it: it fits an ARMA model (sel

Tau.eps

Tau.eps[k] contains the sum of autocovariances associated to the random errors of the regression model $k$ ($k=1, ..., L$). If NULL (the default), the function tries to estimate it: it fits an ARMA model (selected according to an information

if Var.Cov.eps=NULL and/or Tau.eps=NULL, b0 contains the pilot bandwidth for the estimator of $\beta_k$ ($k=1, ..., L$) used for obtaining the residuals to construct the default for Var.Cov.eps and/or

if Var.Cov.eps=NULL and/or Tau.eps=NULL, (b0, h0) contains the pair of pilot bandwidths for the estimator of $m_k$ ($k=1, ..., L$) used for obtaining the residuals to construct the default for Var.Cov.eps

lag.max

if Tau.eps=NULL, lag.max contains the maximum delay used to construct the default for Tau.eps. The default is 50.

p.max

if Var.Cov.eps=NULL and/or Tau.eps=NULL, the ARMA models are selected between the models ARMA(p,q) with 0<=p<=p.max and 0<=q<=q.max. The default is 3.

q.max

if Var.Cov.eps=NULL and/or Tau.eps=NULL, the ARMA models are selected between the models ARMA(p,q) with 0<=p<=p.max and 0<=q<=q.max. The default is 3.

ic

if Var.Cov.eps=NULL and/or Tau.eps=NULL, ic contains the information criterion used to suggest the ARMA models. It allows us to choose between: "AIC", "AICC" or "BIC" (the default).

num.lb

if Var.Cov.eps=NULL and/or Tau.eps=NULL, it checks the suitability of the selected ARMA models according to the Ljung-Box test and the t-test. It uses up to num.lb delays in the Ljung-Box test. The default is 10.

alpha

if Var.Cov.eps=NULL and/or Tau.eps=NULL, alpha contains the significance level which the ARMA models are checked. The default is 0.05.

`Value`

A list with two dataframes:
parametric.testa dataframe containing the bandwidths, the statistics and the p-values when one tests $H0: \beta_1 = ...= \beta_L$.
nonparametric.testa dataframe containing the bandwidths b and h, the statistics, the normalised statistics and the p-values when one tests $H0: m_1 = ...= m_L$.
Moreover, if data is a time series and Tau.eps or Var.Cov.eps are not especified:
pv.Box.testp-values of the Ljung-Box test for the model fitted to the residuals.
pv.t.testp-values of the t.test for the model fitted to the residuals.
ar.maARMA orders for the model fitted to the residuals.

`Details`

A weight function (specifically, the indicator function 1$_{[w[1] , w[2]]}$) is introduced in the test statistic for testing $H0: m_1 = ... = m_L$ to allow elimination (or at least significant reduction) of boundary effects from the estimate of $m_k(t_i)$.

If Var.Cov.eps=NULL and the routine is not able to suggest an approximation for Var.Cov.eps, it warns the user with a message saying that the model could be not appropriate and then it shows the results. In order to construct Var.Cov.eps, the procedure suggested in Aneiros-Perez and Vieu (2013) can be followed.

If Tau.eps=NULL and the routine is not able to suggest an approximation for Tau.eps, it warns the user with a message saying that the model could be not appropriate and then it shows the results. In order to construct Tau.eps, the procedures suggested in Aneiros-Perez (2008) can be followed.

Expressions for the implemented statistic tests can be seen in (15) and (16) in Aneiros-Perez (2008).

`References`

Aneiros-Perez, G. (2008) Semi-parametric analysis of covariance under dependence conditions within each group. Aust. N. Z. J. Stat. 50, 97-123.

Aneiros-Perez, G. and Vieu, P. (2013) Testing linearity in semi-parametric functional data analysis. Comput. Stat. 28, 413-434.

`See Also`

Other related functions are plrm.est, par.ancova and np.ancova.

`Examples`

Run this code# EXAMPLE 1: REAL DATA
data(barnacles1)
data <- as.matrix(barnacles1)
data <- diff(data, 12)
data <- cbind(data,1:nrow(data))

data(barnacles2)
data2 <- as.matrix(barnacles2)
data2 <- diff(data2, 12)
data2 <- cbind(data2,1:nrow(data2))

data3 <- array(0, c(nrow(data),ncol(data)-1,2))
data3[,,1] <- data[,-4]
data3[,,2] <- data2[,-4]
t <- data[,4]

plrm.ancova(data=data3, t=t)



# EXAMPLE 2: SIMULATED DATA
## Example 2a: dependent data - true null hypotheses

set.seed(1234)
# We generate the data
n <- 100
t <- ((1:n)-0.5)/n
beta <- c(0.05, 0.01)

m1 <- function(t) {0.25*t*(1-t)}
f <- m1(t)
x1 <- matrix(rnorm(200,0,1), nrow=n)
sum1 <- x1%*%beta
epsilon1 <- arima.sim(list(order = c(1,0,0), ar=0.7), sd = 0.01, n = n)
y1 <-  sum1 + f + epsilon1
data1 <- cbind(y1,x1)

x2 <- matrix(rnorm(200,1,2), nrow=n)
sum2 <- x2%*%beta
epsilon2 <- arima.sim(list(order = c(0,0,1), ma=0.5), sd = 0.02, n = n)
y2 <- sum2 + f + epsilon2
data2 <- cbind(y2,x2)

data_eq <- array(c(data1,data2), c(n,3,2))

# We apply the tests
plrm.ancova(data=data_eq, t=t, time.series=TRUE)


## Example 2b: dependent data - false null hypotheses

set.seed(1234)
# We generate the data
n <- 100
t <- ((1:n)-0.5)/n
m3 <- function(t) {0.25*t*(1-t)}
m4 <- function(t) {0.25*t*(1-t)*0.75}
beta3 <- c(0.05, 0.01)
beta4 <- c(0.05, 0.02)

x3 <- matrix(rnorm(200,0,1), nrow=n)
sum3 <- x3%*%beta3
f3 <- m3(t)
epsilon3 <- arima.sim(list(order = c(1,0,0), ar=0.7), sd = 0.01, n = n)
y3 <-  sum3 + f3 + epsilon3
data3 <- cbind(y3,x3)

x4 <- matrix(rnorm(200,1,2), nrow=n)
sum4 <- x4%*%beta4
f4 <- m4(t)
epsilon4 <- arima.sim(list(order = c(0,0,1), ma=0.5), sd = 0.02, n = n)
y4 <-  sum4 + f4 + epsilon4
data4 <- cbind(y4,x4)

data_neq <- array(c(data3,data4), c(n,3,2))

# We apply the tests
plrm.ancova(data=data_neq, t=t, time.series=TRUE)
Run the code above in your browser using DataLab