plrm.gcv: Generalized cross-validation bandwidth selection in PLR models

Description

From a sample ${(Y_i, X_{i1}, ..., X_{ip}, t_i): i=1,...,n}$, this routine computes an optimal pair of bandwidths for estimating the regression function of the model $$Y_i= X_{i1}*\beta_1 +...+ X_{ip}*\beta_p + m(t_i) + \epsilon_i,$$ where $$\beta = (\beta_1,...,\beta_p)$$ is an unknown vector parameter and $$m(.)$$ is a smooth but unknown function. The optimal pair of bandwidths, (b.opt, h.opt), is selected by means of the generalized cross-validation procedure. The bandwidth b.opt is used in the estimate of $\beta$, while the pair of bandwidths (b.opt, h.opt) is considered in the estimate of $m$. Kernel smoothing, combined with ordinary least squares estimation, is used.

Usage

plrm.gcv(data = data, b.equal.h = TRUE, b.seq=NULL, h.seq=NULL, 
num.b = NULL, num.h = NULL, estimator = "NW", kernel = "quadratic")

Arguments

data

data[,1] contains the values of the response variable, $Y$; data[, 2:(p+1)] contains the values of the "linear" explanatory variables, $X_1, ..., X_p$; data[, p+2] contains the values of the "nonparametric"

b.equal.h

if TRUE (the default), the same bandwidth is used for estimating both $\beta$ and $m$.

b.seq

sequence of considered bandwidths, b, in the GCV function for estimating $\beta$.

h.seq

sequence of considered bandwidths, h, in the pair of bandwidths (b, h) used in the GCV function for estimating $m$.

num.b

number of values used to build the sequence of considered bandwidths for estimating $\beta$. If b.seq is not NULL, num.b=length(b.seq). Otherwise, if both num.b and num.h are NULL

num.h

pairs of bandwidths (b, h) are used for estimating $m$, num.h being the number of values considered for h. If h.seq is not NULL, num.h=length(h.seq). Otherwise, if both nu

estimator

allows us the choice between NW (Nadaraya-Watson) or LLP (Local Linear Polynomial). The default is NW.

kernel

allows us the choice between gaussian, quadratic (Epanechnikov kernel), triweight or uniform kernel. The default is quadratic.

Value

bh.optselected value for (b,h).
GCV.optminimum value of the GCV function.
GCVmatrix containing the values of the GCV function for each pair of bandwidths considered.
b.seqsequence of considered bandwidths, b, in the GCV function for estimating $\beta$. If b.seq was not input by the user, it is composed by num.b equidistant values between zero and a quarter of the range of ${t_i}$.
h.seqsequence of considered bandwidths, h, in the pair of bandwidths (b, h) used in the GCV function for estimating $m$. If h.seq was not input by the user, it is composed by num.h equidistant values between zero and a quarter of the range of ${t_i}$.

Details

The implemented procedure generalizes that one in page 423 in Speckman (1988) by allowing two smoothing parameters instead of only one (see Aneiros-Perez et al., 2004).

References

Aneiros-Perez, G., Gonzalez-Manteiga, W. and Vieu, P. (2004) Estimation and testing in a partial linear regression under long-memory dependence. Bernoulli 10, 49-78. Green, P. (1985) Linear models for field trials, smoothing and cross-validation. Biometrika 72, 527-537. Speckman, P. (1988) Kernel smoothing in partial linear models J. R. Statist. Soc. B 50, 413-436.

Examples

Run this code

# EXAMPLE 1: REAL DATA

data(barnacles1)
data <- as.matrix(barnacles1)
data <- diff(data, 12)
data <- cbind(data,1:nrow(data))

aux <- plrm.gcv(data)
aux$bh.opt
plot(aux$b.seq, aux$GCV, xlab="h", ylab="GCV", type="l")



# EXAMPLE 2: SIMULATED DATA
## Example 2a: independent data

set.seed(1234)

# We generate the data
n <- 100
t <- ((1:n)-0.5)/n
beta <- c(0.05, 0.01)
m <- function(t) {0.25*t*(1-t)}
f <- m(t)

x <- matrix(rnorm(200,0,1), nrow=n)
sum <- x%*%beta
epsilon <- rnorm(n, 0, 0.01)
y <-  sum + f + epsilon
data_ind <- matrix(c(y,x,t),nrow=100)

# We obtain the optimal bandwidths
a <-plrm.gcv(data_ind)
a$GCV.opt

GCV <- a$GCV
h <- a$h.seq
plot(h, GCV,type="l")

Run the code above in your browser using DataLab