
Last chance! 50% off unlimited learning
Sale ends in
Combine geographic areas into primary sampling units to limit travel distances
GeoDistPSU(lat, long, dist.sw, max.dist, Input.ID = NULL)
A list with two components:
A data frame with the same number of rows as the input file. Column names are Input.file.ID
and psuID
. The psuID column
contains the PSU number assigned to each geographic unit in the input file; multiple rows of the input file will typically be assigned to the same PSU.
A data frame with the number of rows equal to the number of PSUs that are created. Column names are Num.SSUs
, number of SSUs assigned to each PSU; PSU.Mean.Latitude
, mean of the latitudes of the units assigned to a PSU; PSU.Mean.Longitude
, mean of the longitudes of the units assigned to a PSU; PSU.Max.Dist
, maximum distance among the SSUs in a PSU
.
latitude variable in an input file. Must be in decimal format.
longitude variable in an input file. Must be in decimal format.
units for distance; either "miles"
or "kms"
(for kilometers)
maximum distance allowed within a PSU between centroids of geographic units
ID field in the input file if present
George Zipf, Richard Valliant
GeoDistPSU
combines geographic secondary sampling units (SSUs), like cities or census block groups, into primary sampling units (PSUs) given a maximum distance allowed between the centroids of the SSUs within each grouped PSU. The input file must have one row for each geographic unit. If the input file does not have an ID field, the function will create a sequential ID that is appended to the output. The latitude and longitude input vectors define the centroid of each input SSU. The complete linkage method for clustering is used. GeoDistPSU
calls the functions distm
and distHaversine
from the geosphere
package to calculate the distances between centroids.
GeoDistMOS
, GeoMinMOS
data(Test_Data_US)
g <- GeoDistPSU(Test_Data_US$lat,
Test_Data_US$long,
"miles", 100,
Input.ID = Test_Data_US$ID)
# Plot GeoDistPSU output
plot(g$PSU.Info$PSU.Mean.Longitude,
g$PSU.Info$PSU.Mean.Latitude,
col = 1:nrow(g$PSU.Info),
pch = 19,
main = "Plot of PSU Centers",
xlab = "Longitude",
ylab = "Latitude")
grid(col = "grey40")
# Plot GeoDistPSU output with map
if (FALSE) {
# install package sf to run usmap_transform
library(ggplot2)
library(sp)
library(usmap)
# Transform PSUs into usmap projection
g.map <- cbind(long = g$PSU.Info$PSU.Mean.Longitude,
lat = g$PSU.Info$PSU.Mean.Latitude)
g.map <- as.data.frame(g.map)
g.proj <- usmap::usmap_transform(g.map,
input_names = c("long", "lat"),
output_names = c("Long", "Lat"))
usmap::plot_usmap(color = "gray") +
geom_point(data = g.proj,
aes(x = Long,
y = Lat))
# Create histogram of maximum distance
hist(g$PSU.Info$PSU.Max.Dist,
main = "Histogram of Maximum Within-PSU Distance",
xlab = "Distance",
ylab = "Frequency")
}
Run the code above in your browser using DataLab