kdensity: K-density functions for distances between geographic coordinates

Description

Calculates K-density functions for lat-long coordinates. Calculates the distance, d, between every pair of observations and plots the density, $f(d_0)$, at a set of target distances, $d_0$. The kernel density functions are calculated using the density function.

Usage

kdensity(longitude,latitude,kilometer=FALSE,noplot=FALSE,
dmin=0,dmax=0,dlength=512,h=0,kern="gaussian",nsamp=0,
confint=TRUE,pval=.05)

Arguments

longitude

Longitude variable, in degrees.

latitude

Latitude variable, in degrees.

kilometer

If kilometer = T, measurements are in kilometers rather than miles. Default: kilometer = F.

noplot

If noplot = T, does not show the graph of the K-density function.

dmin

Minimum value for target distances. Default: dmin=0.

dmax

Maximum value for target distances. Default: dmin = max(distance), specified by setting dmin=0.

dlength

Number of target values for density calculations. Default: dlength = 512.

Bandwidth. Default: (.9*(quantile(distance,.75)-quantile(distance,.25))/1.34)*(n^(-.20)), where n = 2*length(dvect).

kern

Kernel. Default: "gaussian ". Other options from the density function are also available, including "epanechnikov", "rectangular", "triangular", "biweight", and "optcosine". The "cosine" kernel is translated to "optcosine".

nsamp

If nsamp>0, draws a random sample of lat-long pairs for calculations rather than the full data set. Can be much faster for large samples. Default: use full sample.

confint

If TRUE, adds local confidence intervals to the graph. Default: confint=TRUE.

pval

p-value for confidence intervals. Default: pval=.05.

Value

distance: The vector of target distances.
dhat: The vector of densities for the target distances.
dvect: The full vector of distances between observation pairs. Length is n(n-1)/2.
h: The bandwidth.
se: The vector of standard errors.

Details

The kdensity function uses Silverman's (1986) reflection method to impose zero densities at negative densities. This method involves supplementing each distance observation with its negative value to form a pseudo data set with twice the original number of observations. The following commands are the core of the function:

dfit1 <- density(dvect,from=dmin,to=dmax,n=dlength,kernel=kern,bw=h) dfit2 <- density(-dvect,from=dmin,to=dmax,n=dlength,kernel=kern,bw=h) distance <- dfit1$x dhat <- dfit1$y + dfit2$y

Local standard errors are calculated using the following asymptotic formula:

$(nh)^{-.5} (f(x) \int K^2(\psi)d \psi )^{.5} $

References

Duranton, Gilles and Henry G. Overman, "Testing for Localisation using Microgeographic Data", Review of Economic Studies 72 (2005), 1077-1106.

Klier Thomas and Daniel P. McMillen, "Evolving Agglomeration in the U.S. Auto Industry," Journal of Regional Science 48 (2008), 245-267.

Silverman, A. W., Density Estimation for Statistics and Data Analysis, Chapman and Hall, New York (1986).

Examples

Run this code

data(matchdata)
lmat <- cbind(matchdata$longitude,matchdata$latitude)
# Smaller sample to reduce computation time for example
set.seed(18493)
obs <- sample(seq(1,nrow(lmat)),400)
lmat <- lmat[obs,]
fit95 <- kdensity(lmat[,1],lmat[,2],noplot=FALSE)

Run the code above in your browser using DataLab