cparlwrgrid: Conditionally parametric LWR regression bandwidth or window selection

Description

Finds the value of a user-provided array of window or bandwidth values that provides the lowest cv or gcv for a CPAR model. Calls cparlwr and returns its full output for the chosen value of h.

Usage

cparlwrgrid(form,nonpar,window=0,bandwidth=0,kern="tcub",method="gcv",
  print=TRUE,distance="Mahal",alldata=FALSE,data=NULL)

Arguments

form

Model formula

nonpar

List of either one or two variables for z. Formats: cparlwr(y~xlist, nonpar=~z1, ...) or cparlwr(y~xlist, nonpar=~z1+z2, ...). Important: note the "~" before the first z variable.

window

Window size. Default: not used.

bandwidth

Bandwidth. Default: not used.

kern

Kernel weighting functions. Default is the tri-cube. Options include "rect", "tria", "epan", "bisq", "tcub", "trwt", and "gauss".

method

Specifies "gcv" or "cv" criterion function. Default: method="gcv".

If TRUE, prints gcv or cv values for each value of the window or bandwidth.

distance

Options: "Euclid", "Mahal", or "Latlong" for Euclidean, Mahalanobis, or "great-circle" geographic distance. May be abbreviated to the first letter but must be capitalized. Note: cparlwr looks for the first two letters to determine which

alldata

If alldata=T, each observation is used as a target value for z. When alldata=F, the function is estimated at a set of points chosen by the locfit program using an adaptive decision tree approach, and the akima<

data

A data frame containing the data. Default: use data in the current working directory

Value

targetThe target points for the original estimation of the function.
ytargetThe predicted values of y at the target values z.
xcoef.targetEstimated coefficients, B(z), at the target values of z.
ytarget.seStandard errors for the predicted y at the target values of z.
xcoef.target.seStandard errors for B(z) at the target values of z.
yhatPredicted values of y at the original data points.
xcoefEstimated coefficients, B(z), at the original data points.
yhat.seStandard errors for the predicted values of y, full data set.
xcoef.seStandard errors for B(z) with z evaluated at all points in the data set.
df1tr(L), a measure of the degrees of freedom used in estimation.
df2tr(L'L), an alternative measure of the degrees of freedom used in estimation.
sig2Estimated residual variance, sig2 = rss/(n-2*df1+df2).
cvCross-validation measure. cv = mean(((y-yhat)/(1-infl))^2), where yhat is the vector of predicted values for y and infl is the vector of diagonal terms for L.
gcvgcv = n*(n*sig2)/((n-nreg)^2), where sig2 is the estimated residual variance and nreg = 2*df1 - df2.
inflA vector containing the diagonal elements of L.

References

Cleveland, William S. and Susan J. Devlin, "Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting," Journal of the American Statistical Association 83 (1988), 596-610. Loader, Clive. Local Regression and Likelihood. New York: Springer, 1999. McMillen, Daniel P., "One Hundred Fifty Years of Land Values in Chicago: A Nonparametric Approach," Journal of Urban Economics 40 (1996), 100-124. McMillen, Daniel P., "Issues in Spatial Data Analysis," Journal of Regional Science 50 (2010), 119-141. McMillen, Daniel P., "Employment Densities, Spatial Autocorrelation, and Subcenters in Large Metropolitan Areas," Journal of Regional Science 44 (2004), 225-243. McMillen, Daniel P. and John F. McDonald, "A Nonparametric Analysis of Employment Density in a Polycentric City," Journal of Regional Science 37 (1997), 591-612. McMillen, Daniel P. and Christian Redfearn, ``Estimation and Hypothesis Testing for Nonparametric Hedonic House Price Functions,'' Journal of Regional Science 50 (2010), 712-733. Pagan, Adrian and Aman Ullah. Nonparametric Econometrics. New York: Cambridge University Press, 1999.

Examples

Run this code

par(ask=TRUE)
n = 1000
z1 <- runif(n,0,2*pi)
z1 <- sort(z1)
z2 <- runif(n,0,2*pi)
o1 <- order(z1)
o2 <- order(z2)
ybase1 <-  z1 - .1*(z1^2) + sin(z1) - cos(z1) - .5*sin(2*z1) + .5*cos(2*z1) 
ybase2 <- -z2 + .1*(z2^2) - sin(z2) + cos(z2) + .5*sin(2*z2) - .5*cos(2*z2)
ybase <- ybase1+ybase2
sig = sd(ybase)/2
y <- ybase + rnorm(n,0,sig)
summary(lm(y~ybase))

# Single variable estimation
fit1 <- cparlwrgrid(y~z1,nonpar=~z1,window=seq(.10,.40,.10))
c(fit1$df1,fit1$df2,2*fit1$df1-fit1$df2)
plot(z1[o1],ybase1[o1],type="l",ylim=c(min(ybase1,fit1$yhat),max(ybase1,fit1$yhat)),
  xlab="z1",ylab="y")
# Make predicted and actual values have the same means
fit1$yhat <- fit1$yhat - mean(fit1$yhat) + mean(ybase1)
lines(z1[o1],fit1$yhat[o1], col="red")
legend("topright", c("Base", "LWR"), col=c("black","red"),lwd=1)
fit2 <- cparlwrgrid(y~z2,nonpar=~z2,window=seq(.10,.40,.10))
fit2$yhat <- fit2$yhat - mean(fit2$yhat) + mean(ybase2)
c(fit2$df1,fit2$df2,2*fit2$df1-fit2$df2)
plot(z2[o2],ybase2[o2],type="l",ylim=c(min(ybase2,fit2$yhat),max(ybase2,fit2$yhat)),
    xlab="z1",ylab="y")
lines(z2[o2],fit2$yhat[o2], col="red")
legend("topright", c("Base", "LWR"), col=c("black","red"),lwd=1)

#both variables
fit3 <- cparlwrgrid(y~z1+z2,nonpar=~z1+z2,window=seq(.10,.20,.05))
yhat1 <- fit3$yhat - mean(fit3$yhat) + mean(ybase1)
plot(z1[o1],yhat1[o1], xlab="z1",ylab="y")
lines(z1[o1],ybase1[o1],col="red")
yhat2 <- fit3$yhat - mean(fit3$yhat) + mean(ybase2)
plot(z2[o2],yhat2[o2], xlab="z2",ylab="y")
lines(z2[o2],ybase2[o2],col="red")