smooth.map: Smooth out aggregated data

Description

Increases the resolution of data aggregated over map regions, by either smoothing or interpolation. Also fills in missing values.

Usage

smooth.map(m, z, res = 50, span = 1/10, averages = FALSE, type = c("smooth",
"interp"), merge = FALSE)

Arguments

a map object

a named vector

res

a vector of length two, specifying the resolution of the sampling grid in each dimension. If a single number, it is taken as the vertical resolution, with double taken as the horizontal resolution.

span

kernel parameter (larger = smoother). span = Inf is a special case which invokes the cubic spline kernel. span is automatically scaled by the map size, and is independent of res.

averages

If TRUE, the values in z are interpreted as averages over the regions. Otherwise they are interpreted as totals.

type

see details.

merge

If TRUE, a region named in z includes all matching regions in the map (according to match.map). If FALSE, a region named in z is assumed

Value

A data frame with columns x, y, and z giving the smoothed value z for locations (x, y). Currently the (x, y) values form a grid, but this is not guaranteed in the future.

Details

For type = "smooth", the region totals are first converted into point measurements on the sampling grid, by dividing the total for a region among all sample points inside it. Then it is a regular kernel smoothing problem. Note that the region totals are not preserved. The prediction $z_o$ for location $x_o$ (a vector) is the average of z for nearby sample points: $$z_o = \frac{\sum_x k(x, x_o) z(x)}{\sum_x k(x, x_o)}$$ $$k(x, x_o) = exp(-\lambda ||x - x_o||^2)$$ $\lambda$ is determined from span. Note that $x_o$ is over the same sampling grid as $x$, but $z_o$ is not necessarily the same as $z(x_o)$. For type = "interp", the region totals are preserved by the higher-resolution function. The function is assumed to come from a Gaussian process with kernel $k$. The measurement z[r] is assumed to be the sum of the function over the discrete sample points inside region r. This leads to a simple formula for the covariance matrix of z and the cross-covariance between zo and z. The prediction is the cross-covariance times the inverse covariance times z. Unlike Tobler's method, the predictions are not constrained to live within the original data range, so there tends to be "ringing" effects.

See the references for more details.

References

W.F. Eddy and A. Mockus. An example of the estimation and display of a smoothly varying function of time and space - the incidence of mumps disease. Journal of the American Society for Information Science, 45(9):686-693, 1994. http://www.research.avayalabs.com/user/audris/papers/jasis.pdf W. R. Tobler. Smooth pycnophylactic interpolation for geographical regions. Journal of the American Statistical Association 74:519-530, 1979.

Examples

Run this code

# compare to the example for match.map
data(state, package = "base")
data(votes.repub)
z = votes.repub[, "1900"]
m = map("state", fill = TRUE, plot = FALSE)
# use a small span to fill in, but not smooth, the data
# increase the resolution to get better results
fit = smooth.map(m, z, span = 1/100, merge = TRUE, ave = TRUE)
mat = tapply(fit$z, fit[1:2], mean)
gray.colors <- function(n) gray(rev(0:(n - 1))/n)
par(bg = "blue")
filled.contour(mat, color.palette = gray.colors, nlev = 32, asp = 1)
# another way to visualize:
image(mat, col = gray.colors(100))

# for a higher degree of smoothing:
# fit = smooth.map(m, z, merge = TRUE, ave = TRUE)
# interpolation, state averages are preserved:
# fit = smooth.map(m, z, merge = TRUE, ave = TRUE, type = "interp")

Run the code above in your browser using DataLab