PLRModels (version 1.1)

plrm.gcv: Generalized cross-validation bandwidth selection in PLR models

Description

From a sample ${(Y_i, X_{i1}, ..., X_{ip}, t_i): i=1,...,n}$, this routine computes an optimal pair of bandwidths for estimating the regression function of the model $$Y_i= X_{i1}*\beta_1 +...+ X_{ip}*\beta_p + m(t_i) + \epsilon_i,$$ where $$\beta = (\beta_1,...,\beta_p)$$ is an unknown vector parameter and $$m(.)$$ is a smooth but unknown function. The optimal pair of bandwidths, (b.opt, h.opt), is selected by means of the generalized cross-validation procedure. The bandwidth b.opt is used in the estimate of $\beta$, while the pair of bandwidths (b.opt, h.opt) is considered in the estimate of $m$. Kernel smoothing, combined with ordinary least squares estimation, is used.

Usage

plrm.gcv(data = data, b.equal.h = TRUE, b.seq=NULL, h.seq=NULL, 
num.b = NULL, num.h = NULL, estimator = "NW", kernel = "quadratic")

Arguments

data
data[,1] contains the values of the response variable, $Y$; data[, 2:(p+1)] contains the values of the "linear" explanatory variables, $X_1, ..., X_p$; data[, p+2] contains the values of the "nonparametric"
b.equal.h
if TRUE (the default), the same bandwidth is used for estimating both $\beta$ and $m$.
b.seq
sequence of considered bandwidths, b, in the GCV function for estimating $\beta$.
h.seq
sequence of considered bandwidths, h, in the pair of bandwidths (b, h) used in the GCV function for estimating $m$.
num.b
number of values used to build the sequence of considered bandwidths for estimating $\beta$. If b.seq is not NULL, num.b=length(b.seq). Otherwise, if both num.b and num.h are NULL
num.h
pairs of bandwidths (b, h) are used for estimating $m$, num.h being the number of values considered for h. If h.seq is not NULL, num.h=length(h.seq). Otherwise, if both nu
estimator
allows us the choice between NW (Nadaraya-Watson) or LLP (Local Linear Polynomial). The default is NW.
kernel
allows us the choice between gaussian, quadratic (Epanechnikov kernel), triweight or uniform kernel. The default is quadratic.

Value

  • bh.optselected value for (b,h).
  • GCV.optminimum value of the GCV function.
  • GCVmatrix containing the values of the GCV function for each pair of bandwidths considered.
  • b.seqsequence of considered bandwidths, b, in the GCV function for estimating $\beta$. If b.seq was not input by the user, it is composed by num.b equidistant values between zero and a quarter of the range of ${t_i}$.
  • h.seqsequence of considered bandwidths, h, in the pair of bandwidths (b, h) used in the GCV function for estimating $m$. If h.seq was not input by the user, it is composed by num.h equidistant values between zero and a quarter of the range of ${t_i}$.

Details

The implemented procedure generalizes that one in page 423 in Speckman (1988) by allowing two smoothing parameters instead of only one (see Aneiros-Perez et al., 2004).

References

Aneiros-Perez, G., Gonzalez-Manteiga, W. and Vieu, P. (2004) Estimation and testing in a partial linear regression under long-memory dependence. Bernoulli 10, 49-78. Green, P. (1985) Linear models for field trials, smoothing and cross-validation. Biometrika 72, 527-537. Speckman, P. (1988) Kernel smoothing in partial linear models J. R. Statist. Soc. B 50, 413-436.

See Also

Other related functions are: plrm.beta, plrm.est, plrm.cv, np.est, np.gcv and np.cv.

Examples

Run this code
# EXAMPLE 1: REAL DATA

data(barnacles1)
data <- as.matrix(barnacles1)
data <- diff(data, 12)
data <- cbind(data,1:nrow(data))

aux <- plrm.gcv(data)
aux$bh.opt
plot(aux$b.seq, aux$GCV, xlab="h", ylab="GCV", type="l")



# EXAMPLE 2: SIMULATED DATA
## Example 2a: independent data

set.seed(1234)

# We generate the data
n <- 100
t <- ((1:n)-0.5)/n
beta <- c(0.05, 0.01)
m <- function(t) {0.25*t*(1-t)}
f <- m(t)

x <- matrix(rnorm(200,0,1), nrow=n)
sum <- x%*%beta
epsilon <- rnorm(n, 0, 0.01)
y <-  sum + f + epsilon
data_ind <- matrix(c(y,x,t),nrow=100)

# We obtain the optimal bandwidths
a <-plrm.gcv(data_ind)
a$GCV.opt

GCV <- a$GCV
h <- a$h.seq
plot(h, GCV,type="l")

Run the code above in your browser using DataCamp Workspace