qreglwr: Locally Weighted Quantile Regression

Description

Estimates a model of the form y = f(x) using locally weighted quantile regression for a set of user-provided quantiles. x can include either one or two variables. Returns estimated values, derivatives, and standard errors for both f(x) and df(x)/dx.

Usage

qreglwr(form,taumat=c(.10,.25,.50,.75,.90), window=.25,bandwidth=0,
  kern="tcub", distance="Mahal",alldata=FALSE,data=NULL)

Arguments

form

Model formula

taumat

Vector of target quantiles. Default: taumat=c(.10,.25,.50,.75,.90)

window

Window size. Default: 0.25.

bandwidth

Bandwidth. Default: not used.

kern

Kernel weighting functions. Default is the tri-cube. Options include "rect", "tria", "epan", "bisq", "tcub", "trwt", and "gauss".

distance

Options: "Euclid", "Mahal", or "Latlong" for Euclidean, Mahalanobis, or "great-circle" geographic distance. May be abbreviated to the first letter but must be capitalized. Note: qreglwr looks for the first two letters to determine which

alldata

If alldata=T, each observation is used as a target value for x. When alldata=F, the function is estimated at a set of points chosen by the locfit program using an adaptive decision tree approach, and the akima<

data

A data frame containing the data. Default: use data in the current working directory.

Value

targetThe target points for the original estimation of the function.
ytargetThe matrix of predicted values of y at the target points, by quantile. Rows represent targets; columns are quantiles.
dtarget1The matrix of estimated derivatives dy/dx1 at the target points, by quantile. Rows represent targets; columns are quantiles.
dtarget2The matrix of estimated derivatives dy/dx2 at the target points, by quantile. Rows represent targets; columns are quantiles. All zeros if the model has only one explanatory variable.
ytarget.seThe matrix of standard errors for the predicted values of y at the target points, by quantile. Rows represent targets; columns are quantiles.
dtarget1.seThe matrix of standard errors for the derivatives dy/dx1 at the target points, by quantile. Rows represent targets; columns are quantiles.
dtarget2.seThe matrix of standard errors for the derivatives dy/dx2 at the target points, by quantile. Rows represent targets; columns are quantiles. All zeros if the model has only one explanatory variable.
yhatThe matrix of predicted values of y for the full data set, by quantile. Dimension = n x length(taumat).
dhat1The matrix of estimated derivatives dy/dx1 for the full data set, by quantile. Dimension = n x length(taumat).
dhat2The matrix of estimated derivatives dy/dx2 for the full data set, by quantile. Dimension = n x length(taumat). All zeros if the model has only one explanatory variable.
yhat.seThe matrix of standard errors for the predicted values of y for the full data set, by quantile. Dimension = n x length(taumat).
dhat1.seThe matrix of standard errors for the estimated derivatives dy/dx1 for the full data set, by quantile. Dimension = n x length(taumat).
dhat2.seThe matrix of standard errors for the estimated derivatives dy/dx2 for the full data set, by quantile. Dimension = n x length(taumat). All zeros if the model has only one explanatory variable.

Details

Serves as an interface to the quantreg package. Uses a kernel weight function in quantreg's "weight" option to estimate quantile regressions at a series of target values of x. x may include either one or two variables. The target values are found using locfit's adaptive decision tree approach. The predictions are then interpolated to the full set of x values using the akima package. If alldata=T, the procedure is applied to every value of x rather than a set of target points. The weights at a target value $x_0$ are given by $K(\psi/h)$, where $\psi$ is a measure of the distance between x and $x_0$ and h is the bandwidth or window. When x includes a single variable, $\psi = x-x_0.$ When x includes two variables, the method for specifying $\psi$ depends on the distance option. If distance="Mahal" or distance="Euclid", the ith row of the matrix X = (x1, x2) is transformed such that $x_i = sqrt(x_i * V * t(x_i))$. Under the "Mahal" option, V is the inverse of cov(X). Under the "Euclid" option, V is the inverse of diag(cov(X)). By reducing x from two dimensions to one, this transformation leads again to the simple kernel weighting function $K((x- x_0 )/(sd(x)*h))$. h is specified by the bandwidth or window options. The great circle formula is used to define K when distance = "Latlong"; in this case, the explanatory variable list must be specified as ~latitude+longitude (or ~lo+la or ~lat+long, etc), with the longitude and latitude variables expressed in degrees (e.g., -87.627800 and 41.881998 for one observation of longitude and latitude, respectively). The order in which latitude and longitude are listed does not matter and the function only looks for the first two letters to determine which variable is latitude and which is longitude. It is important to note that the great circle distance measure is left in miles rather than being standardized. Thus, the window option should be specified when distance = "Latlong" or the bandwidth should be adjusted to account for the scale. The kernel weighting function becomes K(distance/h) under the "Latlong" option. Since qreglwr estimates weighted quantile regressions of the dependent variable, y, on $x-x_0$, the intercept provides an estimate of y at $x_0$ and $\beta$ provides an estimate of the slope of the quantile line, dy/dx, at $x_0$. quantreg's standard error for the intercept is stored in ytarget.se (target points) and yhat.se (all observations). The standard errors for the slopes are stored in dtarget1.se, dtarget2.se, dhat1.se, and dhat2.se. When alldata=T, each data point in turn is used as a target point, $x_0$. Fixed bandwidths may prove too small if there are regions where x is sparse. A nearest neighbor approach is generally preferable (e.g, window=.50). Estimation can be very slow when alldata=T. When alldata=F, the package locfit is used to find a good set of target points at which to evaluate the function. See Loader (1999, section 12.2) for a description of the algorithm used to determine the target points. The akima package is then used to interpolate the coefficient estimates and standard errors. Available kernel weighting functions include the following: lll{ Kernel Call abbreviation Kernel function K(z) Rectangular ``rect'' $\frac{1}{2} I(|z|

References

Cleveland, William S. and Susan J. Devlin, "Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting," Journal of the American Statistical Association 83 (1988), 596-610. Koenker, Roger. Quantile Regression. New York: Cambridge University Press, 2005. Chapter 7 and Appendix A.9. Loader, Clive. Local Regression and Likelihood. New York: Springer, 1999.

Examples

Run this code

data(cookdata)
cookdata <- cookdata[cookdata$CHICAGO==1,]
cookdata$obs <- seq(1:nrow(cookdata))
cookdata <- cookdata[cookdata$CHICAGO==1&cookdata$POPULATION>0,]
par(ask=TRUE)

# lndens = f(dcbd)
fit <- lwr(LNDENS~DCBD,window=.20,data=cookdata)
fit1 <- qreglwr(LNDENS~DCBD,taumat=c(.10,.50,.90),window=.30,kern="rect",data=cookdata)
o <- order(cookdata$DCBD)
ymin = min(fit1$yhat)
ymax = max(fit1$yhat)
plot(cookdata$DCBD[o], fit$yhat[o], type="l", ylim=c(ymin,ymax),
  xlab="Distance to CBD", ylab="Log of Population Density")
lines(cookdata$DCBD[o], fit1$yhat[o,1], col="red", lty="dashed")
lines(cookdata$DCBD[o], fit1$yhat[o,2], col="red")
lines(cookdata$DCBD[o], fit1$yhat[o,3], col="red", lty="dashed")
legend("topright", c("LWR", "tau = 50", "tau = 10, 90"), col=c("black","red", "red"), 
  lwd=1, lty=c("solid","solid","dashed"))

library(maptools)
library(RColorBrewer)
cmap <- readShapePoly(system.file("maps/CookCensusTracts.shp",
  package="McSpatial"))
cmap <- cmap[cmap$CHICAGO==1,]
# lndens = f(longitude, latitude), weights are function of straight-line distance
fit <- qreglwr(LNDENS~LONGITUDE+LATITUDE,taumat=c(.10,.50,.90),window=.20,data=cookdata)
cmap$lwr10[cookdata$obs] <- fit$yhat[,1]
cmap$lwr50[cookdata$obs] <- fit$yhat[,2]
cmap$lwr90[cookdata$obs] <- fit$yhat[,3]
cmap$lwr1090[cookdata$obs] <- fit$yhat[,3] - fit$yhat[,1]
brks <- seq(min(cmap$lwr10,na.rm=TRUE),max(cmap$lwr10,na.rm=TRUE),length=9)
spplot(cmap,"lwr10",at=brks,col.regions=rev(brewer.pal(9,"RdBu")),
   main="Log Density Estimates, tau = .10")
brks <- seq(min(cmap$lwr50,na.rm=TRUE),max(cmap$lwr50,na.rm=TRUE),length=9)
spplot(cmap,"lwr50",at=brks,col.regions=rev(brewer.pal(9,"RdBu")),
   main="Log Density Estimates, tau = .50")
brks <- seq(min(cmap$lwr90,na.rm=TRUE),max(cmap$lwr90,na.rm=TRUE),length=9)
spplot(cmap,"lwr90",at=brks,col.regions=rev(brewer.pal(9,"RdBu")),
   main="Log Density Estimates, tau = .90")
brks <- seq(min(cmap$lwr1090,na.rm=TRUE),max(cmap$lwr1090,na.rm=TRUE),length=9)
spplot(cmap,"lwr1090",at=brks,col.regions=rev(brewer.pal(9,"RdBu")),
   main="Difference in Log Density, tau = .90 - .10")

Run the code above in your browser using DataLab