np.gcv: Generalized cross-validation bandwidth selection in nonparametric regression models

Description

From a sample ${(Y_i, t_i): i=1,...,n}$, this routine computes an optimal bandwidth for estimating $m$ in the regression model $$Y_i= m(t_i) + \epsilon_i.$$ The regression function, $m$, is a smooth but unknown function. The optimal bandwidth is selected by means of the generalized cross-validation procedure. Kernel smoothing is used.

Usage

np.gcv(data = data, h.seq=NULL, num.h = 50, estimator = "NW", 
kernel = "quadratic")

Arguments

data

data[, 1] contains the values of the response variable, $Y$; data[, 2] contains the values of the explanatory variable, $t$.

h.seq

sequence of considered bandwidths in the GCV function. If NULL (the default), num.h equidistant values between zero and a quarter of the range of $t_i$ are considered.

num.h

number of values used to build the sequence of considered bandwidths. If h.seq is not NULL, num.h=length(h.seq). Otherwise, the default is 50.

estimator

allows us the choice between NW (Nadaraya-Watson) or LLP (Local Linear Polynomial). The default is NW.

kernel

allows us the choice between gaussian, quadratic (Epanechnikov kernel), triweight or uniform kernel. The default is quadratic.

Value

h.optselected value for the bandwidth.
GCV.optminimum value of the GCV function.
GCVvector containing the values of the GCV function for each considered bandwidth.
h.seqsequence of considered bandwidths in the GCV function.

Details

See Craven and Wahba (1979) and Rice (1984).

References

Craven, P. and Wahba, G. (1979) Smoothing noisy data with spline functions. Numer. Math. 31, 377-403. Rice, J. (1984) Bandwidth choice for nonparametric regression. Ann. Statist. 12, 1215-1230.

Examples

Run this code

# EXAMPLE 1: REAL DATA
data <- matrix(10,120,2)
data(barnacles1)
barnacles1 <- as.matrix(barnacles1)
data[,1] <- barnacles1[,1]
data <- diff(data, 12)
data[,2] <- 1:nrow(data)

aux <- np.gcv(data)
aux$h.opt
plot(aux$h.seq, aux$GCV, xlab="h", ylab="GCV", type="l")



# EXAMPLE 2: SIMULATED DATA
## Example 2a: independent data

set.seed(1234)
# We generate the data
n <- 100
t <- ((1:n)-0.5)/n
m <- function(t) {0.25*t*(1-t)}
f <- m(t)

epsilon <- rnorm(n, 0, 0.01)
y <-  f + epsilon
data_ind <- matrix(c(y,t),nrow=100)

# We apply the function
a <-np.gcv(data_ind)
a$GCV.opt

GCV <- a$GCV
h <- a$h.seq
plot(h, GCV, type="l")

Run the code above in your browser using DataLab