clustering_partition: Obtain a partition of the spatial domain using the density-based spatial clustering (DBSC) algorithm described in Santafé et al. (2021)

Description

The function takes an object of class SpatialPolygonsDataFrame or sf and defines a spatial partition using the DBSC algorithm described in santafe2021;textualbigDM.

Usage

clustering_partition(
  carto,
  ID.area = NULL,
  var = NULL,
  n.cluster = 10,
  min.size = NULL,
  W = NULL,
  l = 1,
  Wk = NULL,
  distance = "euclidean",
  verbose = TRUE
)

Value

sf object with the original data and a grouping variable named 'ID.group'.

Arguments

carto: object of class SpatialPolygonsDataFrame or sf.
ID.area: character; name of the variable that contains the IDs of spatial areal units.
var: character; name of the variable that contains the data of interest to compute spatial clusters, usually the vector of log-SMR.
n.cluster: numeric; value to fix the number of cluster centers in the DBSC algorithm. Default to 10.
min.size: numeric (default NULL); value to fix the minimum size of areas in each spatial partition.
W: optional argument with the binary adjacency matrix of the spatial areal units. If NULL (default), this object is computed from the carto argument (two areas are considered as neighbours if they share a common border).
l: numeric value with the neighbourhood order used to assign areas to each cluster. If k=1 (default), only areas that share a common border are considered.
Wk: previously computed binary adjacency matrix of l-order neighbours. If this argument is included (default NULL), the parameter l is ignored.
distance: the distance measure to be used (default "euclidean"). See the method argument of dist function for other options.
verbose: logical value (default TRUE); indicates if the function runs in verbose mode.

Details

The DBSC algorithm implemented in this function is a new spatial clustering algorithm based on the density clustering algorithm introduced by rodriguez2014clustering;textualbigDM and the posterior modification presented by wang2016automatic;textualbigDM. This algorithm is able to obtain a single clustering partition of the data by automatically detecting clustering centers and assigning each area to its nearest cluster centroid. The algorithm has its basis in the assumption that cluster centers are points with high local density and relatively large distance to other points with higher local densities. See santafe2021;textualbigDM for more details.

References

rodriguez2014clusteringbigDM

santafe2021bigDM

wang2016automaticbigDM

Examples

Run this code

if (FALSE) {
library(sf)
library(tmap)

## Load the Spain colorectal cancer mortality data ##
data(Carto_SpainMUN)

## Define a spatial partition using the DBSC algorithm ##
Carto_SpainMUN$logSMR <- log(Carto_SpainMUN$obs/Carto_SpainMUN$exp+0.0001)

carto.new <- clustering_partition(carto=Carto_SpainMUN, ID.area="ID", var="logSMR",
                                  n.cluster=20, l=2, min.size=100, verbose=TRUE)
table(carto.new$ID.group)

## Plot of the grouping variable 'ID.group' ##
carto.data <- st_set_geometry(carto.new, NULL)
carto.partition <- aggregate(carto.new[,"geometry"], list(ID.group=carto.data[,"ID.group"]), head)

tmap4 <- packageVersion("tmap") >= "3.99"

if(tmap4){
        tm_shape(carto.new) +
                tm_polygons(fill="ID.group", fill.scale=tm_scale(values="brewer.set3")) +
                tm_shape(carto.partition) +
                tm_borders(col="black", lwd=2) +
                tm_layout(legend.outside=TRUE, legend.frame=FALSE)
}else{
        tm_shape(carto.new) +
                tm_polygons(col="ID.group") +
                tm_shape(carto.partition) +
                tm_borders(col="black", lwd=2) +
                tm_layout(legend.outside=TRUE)
}
}

Run the code above in your browser using DataLab