nkde: Network Kernel density estimate

Description

Calculate the Network Kernel Density Estimate based on a network of lines, sampling points, and events

Usage

nkde(
  lines,
  events,
  w,
  samples,
  kernel_name,
  bw,
  adaptive = FALSE,
  trim_bw = NULL,
  method,
  div = "bw",
  diggle_correction = FALSE,
  study_area = NULL,
  max_depth = 15,
  digits = 5,
  tol = 0.1,
  agg = NULL,
  sparse = TRUE,
  grid_shape = c(1, 1),
  verbose = TRUE,
  check = TRUE
)

Arguments

lines

A SpatialLinesDataFrame with the sampling points. The geometries must be a SpatialLinesDataFrame (may crash if some geometries are invalid)

events

A SpatialPointsDataFrame representing the events on the network. The points will be snapped on the network.

A vector representing the weight of each event

samples

A SpatialPointsDataFrame representing the locations for which the densities will be estimated.

kernel_name

The name of the kernel to use. Must be one of triangle, gaussian, scaled gaussian, tricube, cosine ,triweight, quartic, epanechnikov or uniform.

The kernel bandwidth (in meters)

adaptive

A Boolean, indicating if an adaptive bandwidth must be used

trim_bw

A float, indicating the maximum value for the adaptive bandwidth

method

The method to use when calculating the NKDE, must be one of simple / discontinuous / continuous (see details for more information)

div

The divisor to use for the kernel. Must be "n" (the number of events within the radius around each sampling point), "bw" (the bandwith) "none" (the simple sum).

diggle_correction

A Boolean indicating if the correction factor for edge effect must be used.

study_area

A SpatialPolygonsDataFrame or a SpatialPolygon representing the limits of the study area.

max_depth

when using the continuous and discontinuous methods, the calculation time and memory use can go wild if the network has many small edges (area with many of intersections and many events). To avoid it, it is possible to set here a maximum depth. Considering that the kernel is divided at intersections, a value of 10 should yield good estimates in most cases. A larger value can be used without problem for the discontinuous method. For the continuous method, a larger value will strongly impact calculation speed.

digits

The number of digits to retain in the spatial coordinates. It ensures that topology is good when building the network. Default is 3

tol

When adding the events and the sampling points to the network, the minimum distance between these points and the lines' extremities. When points are closer, they are added at the extremity of the lines.

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated by rounding the coordinates.

sparse

A Boolean indicating if sparse or regular matrix should be used by the Rcpp functions. Regular matrix are faster, but require more memory and could lead to error, in particular with multiprocessing. Sparse matrix are slower, but require much less memory.

grid_shape

A vector of two values indicating how the study area must be split when performing the calculus (see details). Default is c(1,1)

verbose

A Boolean, indicating if the function should print messages about process.

check

A Boolean indicating if the geometry checks must be run before calculating the densities

Value

A vector of values, they are the density estimates at samplings points

Details

**The three NKDE methods** Estimating the density of a point process is commonly done by using an ordinary two dimensional kernel density function. However, there are numerous cases for which the events do not occur in a two dimensional space but on a network (like car crashes, outdoor crimes, leaks in pipelines, etc.). New methods were developed to adapt the methodology to networks, three of them are available in this package.

method="simple"This first method was presented by Xie et al. (2008) and proposes an intuitive solution. The distances between events and sampling points are replaced by network distances, and the formula of the kernel is adapted to calculate the density over a linear unit instead of an areal unit.
method="discontinuous"The previous method has been criticized by Okabe et al (2008), arguing that the estimator proposed is biased, leading to an overestimation of density in events hot-spots. More specifically, the simple method does not conserve mass and the induced kernel is not a probability density along the network. They thus proposed a discontinuous version of the kernel function on network, which equally "divides" the mass density of an event at intersections
method="continuous"If the discontinuous method is unbiased, it leads to a discontinuous kernel function which is a bit counter-intuitive. Okabe et al (2008) proposed another version of the kernel, that divide the mass of the density at intersection but adjusts the density before the intersection to make the function continuous.

The three methods are available because, even though that the simple method is less precise statistically speaking, it might be more intuitive. From a purely geographical view, it might be seen as a sort of distance decay function as used in Geographically Weighted Regression.

**adaptive bandwidth** It is possible to use adaptive bandwidth instead of fixed bandwidth. Adaptive bandwidths are calculated using the Abramson<U+2019>s smoothing regimen. To do so, an original fixed bandwidth must be specified (bw parameter), and is used to estimate priory densities at event locations. These densities are then used to calculate local bandwidth. The maximum size of the local bandwidth can be limited with the parameter trim_bw. For more details, see the vignettes.

**Optimization parameters** The grid_shape parameter allows to split the calculus of the NKDE according to a grid dividing the study area. It might be necessary for big dataset to reduce the memory used. If the grid_shape is c(1,1), then a full network is built for the area. If the grid_shape is c(2,2), then the area is split in 4 rectangles. For each rectangle, the sample points falling in the rectangle are used, the events and the lines in a radius of the bandwidth length are used. The results are combined at the end and ordered to match the original order of the samples.

The geographical coordinates of the start and end of lines are used to build the network. To avoid troubles with digits, we truncate the coordinates according to the digit parameter. A minimal loss of precision is expected but results in a fast construction of the network.

To calculate the distances on the network, all the events are added as vertices. To reduce the size of the network, it is possible to reduce the number of vertices by adding the events at the extremity of the lines if they are close to them. This is controlled by the parameter tol.

In the same way, it is possible to limit the number of vertices by aggregating the events that are close to each other. In that case, the weights of the aggregated events are summed. According to an aggregation distance, a buffer is drawn around the fist event, each other event falling in that buffer are aggregated to the first event, forming a new event. The coordinates of this new event are the mean of the original events coordinates. This procedure is repeated until no events are aggregated. The aggregation distance can be fixed with the parameter agg.

When using the continuous and discontinuous kernel, the density is reduced at each intersection crossed. In the discontinuous case, after 5 intersections with four directions each, the density value is divided by 243 leading to very small values. In the same situation but with the continuous NKDE, the density value is divided by approximately 7.6. The max_depth parameters allows the user to control the maximum depth of these two NKDE. The base value is 15, but a value of 10 would yield very close estimates. A lower value might have a critical impact on speed when the bandwidth is large

When using the continuous and discontinuous kernel, the connections between graph nodes are stored in a matrix. This matrix is typically sparse, and so a sparse matrix object is used to limit memory use. If the network is small (typically when the grid used to split the data has small rectangles) then a classical matrix could be used instead of a sparse one. It significantly increases speed, but could lead to memory issues.

Examples

Run this code

# NOT RUN {
networkgpkg <- system.file("extdata", "networks.gpkg", package = "spNetwork", mustWork = TRUE)
eventsgpkg <- system.file("extdata", "events.gpkg", package = "spNetwork", mustWork = TRUE)
mtl_network <- rgdal::readOGR(networkgpkg,layer="mtl_network", verbose=FALSE)
bike_accidents <- rgdal::readOGR(eventsgpkg,layer="bike_accidents", verbose=FALSE)
lixels <- lixelize_lines(mtl_network,200,mindist = 50)
samples <- lines_center(lixels)
densities <- nkde(mtl_network,
                  events = bike_accidents,
                  w = rep(1,nrow(bike_accidents)),
                  samples = samples,
                  kernel_name = "quartic",
                  bw = 300, div= "bw",
                  adaptive = FALSE,
                  method = "discontinuous", digits = 1, tol = 1,
                  agg = 15,
                  grid_shape = c(1,1),
                  verbose=FALSE)
# }

Run the code above in your browser using DataLab