Learn R Programming

spNetwork (version 0.2.1)

bw_cvl_calc.mc: Bandwidth selection by Cronie and Van Lieshout's Criterion (multicore version)

Description

Calculate for multiple bandiwdths the Cronie and Van Lieshout's Criterion to select an appropriate bandwidth in a data-driven approach. A plan from the package future can be used to split the work across several cores. The different cells generated in accordance with the argument grid_shape are used for the parallelization. So if only one cell is generated (grid_shape = c(1,1)), the function will use only one core. The progress bar displays the progression for the cells.

Usage

bw_cvl_calc.mc(
  bw_range,
  bw_step,
  lines,
  events,
  w,
  kernel_name,
  method,
  diggle_correction = FALSE,
  study_area = NULL,
  max_depth = 15,
  digits = 5,
  tol = 0.1,
  agg = NULL,
  sparse = TRUE,
  grid_shape = c(1, 1),
  sub_sample = 1,
  verbose = TRUE,
  check = TRUE
)

Arguments

bw_range

The range of the bandwidths to consider, given as a numeric vector of two values: c(bandwidth_min, bandwidth_max)

bw_step

The step between each bandwidth to calculate given as a float

lines

A SpatialLinesDataFrame representing the underlying network. The geometries must be a SpatialLinesDataFrame (may crash if some geometries are invalid) without MultiLineSring.

events

events A SpatialPointsDataFrame representing the events on the network. The points will be snapped on the network to their closest line.

w

A vector representing the weight of each event

kernel_name

The name of the kernel to use. Must be one of triangle, gaussian, tricube, cosine ,triweight, quartic, epanechnikov or uniform.

method

The method to use when calculating the NKDE, must be one of simple / discontinuous / continuous (see nkde details for more information)

diggle_correction

A Boolean indicating if the correction factor for edge effect must be used.

study_area

A SpatialPolygonsDataFrame or a SpatialPolygon representing the limits of the study area.

max_depth

when using the continuous and discontinuous methods, the calculation time and memory use can go wild if the network has many small edges (area with many of intersections and many events). To avoid it, it is possible to set here a maximum depth. Considering that the kernel is divided at intersections, a value of 10 should yield good estimates in most cases. A larger value can be used without a problem for the discontinuous method. For the continuous method, a larger value will strongly impact calculation speed.

digits

The number of digits to retain from the spatial coordinates. It ensures that topology is good when building the network. Default is 3. Too high a precision (high number of digits) might break some connections

tol

A float indicating the minimum distance between the events and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates.

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. These matrices are used to store edge indices between two nodes in a graph. Regular matrices are faster, but require more memory, in particular with multiprocessing. Sparse matrices are slower (a bit), but require much less memory.

grid_shape

A vector of two values indicating how the study area must be split when performing the calculus. Default is c(1,1) (no split). A finer grid could reduce memory usage and increase speed when a large dataset is used. When using multiprocessing, the work in each grid is dispatched between the workers.

sub_sample

A float between 0 and 1 indicating the percentage of quadra to keep in the calculus. For large datasets, it may be useful to limit the bandwidth evaluation and thus reduce calculation time.

verbose

A Boolean, indicating if the function should print messages about the process.

check

A Boolean indicating if the geometry checks must be run before the operation. This might take some times, but it will ensure that the CRS of the provided objects are valid and identical, and that geometries are valid.

Value

A dataframe with two columns, one for the bandwidths and the second for the Cronie and Van Lieshout's Criterion.

Details

For more details, see help(bw_cvl_calc)

Examples

Run this code
# NOT RUN {
networkgpkg <- system.file("extdata", "networks.gpkg", package = "spNetwork", mustWork = TRUE)
eventsgpkg <- system.file("extdata", "events.gpkg", package = "spNetwork", mustWork = TRUE)
mtl_network <- rgdal::readOGR(networkgpkg,layer="mtl_network", verbose=FALSE)
bike_accidents <- rgdal::readOGR(eventsgpkg,layer="bike_accidents", verbose=FALSE)
future::plan(future::multisession(workers=2))
cv_scores <- bw_cvl_calc(c(200,400),50,
                               mtl_network, bike_accidents,
                               rep(1,nrow(bike_accidents)),
                               "quartic", "discontinuous",
                               diggle_correction = FALSE, study_area = NULL,
                               max_depth = 8,
                               digits=2, tol=0.1, agg=5,
                               sparse=TRUE, grid_shape=c(1,1),
                               sub_sample = 1, verbose=TRUE, check=TRUE)
## make sure any open connections are closed afterward
if (!inherits(future::plan(), "sequential")) future::plan(future::sequential)
# }

Run the code above in your browser using DataLab