From a sample \({(Y_i, X_{i1}, ..., X_{ip}, t_i): i=1,...,n}\), this routine tests the null hypotheses \(H_0: \beta=\beta_0\) and \(H_0: m=m_0\), where: $$\beta = (\beta_1,...,\beta_p)$$ is an unknown vector parameter, $$m(.)$$ is a smooth but unknown function and $$Y_i= X_{i1}*\beta_1 +...+ X_{ip}*\beta_p + m(t_i) + \epsilon_i.$$ Fixed equally spaced design is considered for the "nonparametric" explanatory variable, \(t\), and the random errors, \(\epsilon_i\), are allowed to be time series. The test statistic used for testing \(H0: \beta = \beta_0\) derives from the asymptotic normality of an estimator of \(\beta\) based on both ordinary least squares and kernel smoothing (this result giving a \(\chi^2\)-test). The test statistic used for testing \(H0: m = m_0\) derives from a Cramer-von-Mises-type functional distance between a nonparametric estimator of \(m\) and \(m_0\).
plrm.gof(data = data, beta0 = NULL, m0 = NULL, b.seq = NULL,
h.seq = NULL, w = NULL, estimator = "NW", kernel = "quadratic",
time.series = FALSE, Var.Cov.eps = NULL, Tau.eps = NULL,
b0 = NULL, h0 = NULL, lag.max = 50, p.max = 3, q.max = 3,
ic = "BIC", num.lb = 10, alpha = 0.05)
A list with two dataframes:
a dataframe containing the bandwidths, the statistics and the p-values when one tests \(H0: \beta = \beta_0\)
a dataframe containing the bandwidths b and h, the statistics, the normalised statistics and the p-values when one tests \(H0: m = m_0\)
Moreover, if data
is a time series and Tau.eps
or Var.Cov.eps
are not especified:
p-values of the Ljung-Box test for the model fitted to the residuals.
p-values of the t.test for the model fitted to the residuals.
ARMA orders for the model fitted to the residuals.
data[,1]
contains the values of the response variable, \(Y\);
data[, 2:(p+1)]
contains the values of the "linear" explanatory variables
\(X_1, ... X_p\);
data[, p+2]
contains the values of the "nonparametric" explanatory variable, \(t\).
the considered parameter vector in the parametric null hypothesis. If NULL
(the default), the zero vector is considered.
the considered function in the nonparametric null hypothesis. If NULL
(the default), the zero function is considered.
the statistic test for \(H0: \beta=\beta_0\) is performed using each bandwidth in the vector b.seq
. If NULL
(the default) but h.seq
is not NULL
, it takes b.seq=h.seq
. If both b.seq
and h.seq
are NULL
, 10 equidistant values between zero and a quarter of the range of \({t_i}\) are considered.
the statistic test for \(H0: m=m_0\) is performed using each pair of bandwidths (b.seq[j], h.seq[j]
). If NULL
(the default) but b.seq
is not NULL
, it takes h.seq=b.seq
. If both b.seq
and h.seq
are NULL
, 10 equidistant values between zero and a quarter of the range of \({t_i}\) are considered for both b.seq
and h.seq
.
support interval of the weigth function in the test statistic for \(H0: m = m_0\). If NULL
(the default), \((q_{0.1}, q_{0.9})\) is considered, where \(q_p\) denotes the quantile of order \(p\) of \({t_i}\).
allows us the choice between “NW” (Nadaraya-Watson) or “LLP” (Local Linear Polynomial). The default is “NW”.
allows us the choice between “gaussian”, “quadratic” (Epanechnikov kernel), “triweight” or “uniform” kernel. The default is “quadratic”.
it denotes whether the data are independent (FALSE) or if data is a time series (TRUE). The default is FALSE.
n x n
matrix of variances-covariances associated to the random errors of the regression model. If NULL (the default), the function tries to estimate it: it fits an ARMA model (selected according to an information criterium) to the residuals from the fitted regression model and, then, it obtains the var-cov matrix of such ARMA model.
it contains the sum of autocovariances associated to the random errors of the regression model. If NULL (the default), the function tries to estimate it: it fits an ARMA model (selected according to an information criterium) to the residuals from the fitted regression model and, then, it obtains the sum of the autocovariances of such ARMA model.
if Var.Cov.eps=NULL
and/or Tau.eps=NULL
, b0
contains the pilot bandwidth for the estimator of \(\beta\) used for obtaining the residuals to construct the default for Var.Cov.eps
and/or Tau.eps
. If NULL
(the default) but h0
is not NULL
, it takes b0=h0
. If both b0
and h0
are NULL
, a quarter of the range of \({t_i}\) is considered.
if Var.Cov.eps=NULL
and/or Tau.eps=NULL
, (b0, h0
) contains the pair of pilot bandwidths for the estimator of \(m\) used for obtaining the residuals to construct the default for Var.Cov.eps
and/or Tau.eps
. If NULL
(the default) but b0
is not NULL
, it takes h0=b0
. If both b0
and h0
are NULL
, a quarter of the range of \({t_i}\) is considered for both b0
and h0
.
if Tau.eps=NULL
, lag.max
contains the maximum delay used to construct the default for Tau.eps
. The default is 50.
if Var.Cov.eps=NULL
and/or Tau.eps=NULL
, the ARMA model is selected between the models ARMA(p,q) with 0<=p<=p.max
and 0<=q<=q.max
. The default is 3.
if Var.Cov.eps=NULL
and/or Tau.eps=NULL
, the ARMA model is selected between the models ARMA(p,q) with 0<=p<=p.max
and 0<=q<=q.max
. The default is 3.
if Var.Cov.eps=NULL
and/or Tau.eps=NULL
, ic
contains the information criterion used to suggest the ARMA model. It allows us to choose between: "AIC", "AICC" or "BIC" (the default).
if Var.Cov.eps=NULL
and/or Tau.eps=NULL
, it checks the suitability of the selected ARMA model according to the Ljung-Box test and the t-test. It uses up to num.lb
delays in the Ljung-Box test. The default is 10.
if Var.Cov.eps=NULL
and/or Tau.eps=NULL
, alpha
contains the significance level which the ARMA model is checked. The default is 0.05.
German Aneiros Perez ganeiros@udc.es
Ana Lopez Cheda ana.lopez.cheda@udc.es
A weight function (specifically, the indicator function 1\(_{[w[1] , w[2]]}\)) is introduced in the test statistic for testing \(H0: m = m_0\) to allow elimination (or at least significant reduction) of boundary effects from the estimate of \(m(t_i)\).
If Var.Cov.eps=NULL
and the routine is not able to suggest an approximation for Var.Cov.eps
, it warns the user with a message saying that the model could be not appropriate and then it shows the results. In order to construct Var.Cov.eps
, the procedure suggested in Aneiros-Perez and Vieu (2013) can be followed.
If Tau.eps=NULL
and the routine is not able to suggest an approximation for Tau.eps
, it warns the user with a message saying that the model could be not appropriate and then it shows the results. In order to construct Tau.eps
, the procedures suggested in Aneiros-Perez (2008) can be followed.
The implemented procedures generalize those ones in expressions (9) and (10) in Gonzalez-Manteiga and Aneiros-Perez (2003) by allowing some dependence condition in \({(X_{i1}, ..., X_{ip}): i=1,...,n}\) and including a weight function (see above), respectively.
Aneiros-Perez, G. (2008) Semi-parametric analysis of covariance under dependence conditions within each group. Aust. N. Z. J. Stat. 50, 97-123.
Aneiros-Perez, G., Gonzalez-Manteiga, W. and Vieu, P. (2004) Estimation and testing in a partial linear regression under long-memory dependence. Bernoulli 10, 49-78.
Aneiros-Perez, G. and Vieu, P. (2013) Testing linearity in semi-parametric functional data analysis. Comput. Stat. 28, 413-434.
Gao, J. (1997) Adaptive parametric test in a semiparametric regression model. Comm. Statist. Theory Methods 26, 787-800.
Gonzalez-Manteiga, W. and Aneiros-Perez, G. (2003) Testing in partial linear regression models with dependent errors. J. Nonparametr. Statist. 15, 93-111.
Other related functions are plrm.est
, par.gof
and np.gof
.
# EXAMPLE 1: REAL DATA
data(barnacles1)
data <- as.matrix(barnacles1)
data <- diff(data, 12)
data <- cbind(data,1:nrow(data))
plrm.gof(data)
plrm.gof(data, beta0=c(-0.1, 0.35))
# EXAMPLE 2: SIMULATED DATA
## Example 2a: dependent data
set.seed(1234)
# We generate the data
n <- 100
t <- ((1:n)-0.5)/n
beta <- c(0.05, 0.01)
m <- function(t) {0.25*t*(1-t)}
f <- m(t)
f.function <- function(u) {0.25*u*(1-u)}
x <- matrix(rnorm(200,0,1), nrow=n)
sum <- x%*%beta
epsilon <- arima.sim(list(order = c(1,0,0), ar=0.7), sd = 0.01, n = n)
y <- sum + f + epsilon
data <- cbind(y,x,t)
## Example 2a.1: true null hypotheses
plrm.gof(data, beta0=c(0.05, 0.01), m0=f.function, time.series=TRUE)
## Example 2a.2: false null hypotheses
plrm.gof(data, time.series=TRUE)
Run the code above in your browser using DataLab