Learn R Programming

flexurba (version 0.2.3)

DoU_classify_grid: Create the DEGURBA grid cell classification

Description

The function reconstructs the grid cell classification of the Degree of Urbanisation. The arguments of the function allow to adapt the standard specifications in the Degree of Urbanisation in order to construct an alternative version (see section "Custom specifications" below).

For more information about the Degree of Urbanisation methodology, see the methodological manual, GHSL Data Package 2022 and GHSL Data Package 2023.

Usage

DoU_classify_grid(
  data,
  level1 = TRUE,
  parameters = NULL,
  values = NULL,
  regions = FALSE,
  filename = NULL
)

Value

SpatRaster with the grid cell classification

Arguments

data

path to the directory with the data, or named list with the data as returned by function DoU_preprocess_grid()

level1

logical. Whether to classify the grid according to first hierarchical level (TRUE) or the second hierarchical level (FALSE). For more details, see section "Classification rules" below.

parameters

named list with the parameters to adapt the standard specifications in the Degree of Urbanisation classification. For more details, see section "Custom specifications" below.

values

vector with the values assigned to the different classes in the resulting classification:

  • If level1=TRUE: the vector should contain the values for (1) urban centres, (2) urban clusters, (3) rural grid cells and (4) water cells.

  • If level1=FALSE: the vector should contain the values for (1) urban centres, (2) dense urban clusters, (3) semi-dense urban clusters, (4) suburban or peri-urban cells, (5) rural clusters, (6) low density rural cells, (7) very low density rural cells and (8) water cells.

regions

logical. Whether to execute the classification in the memory-efficient pre-defined regions. For more details, see section "Regions" below (Note that this requires a large amount of memory).

filename

character. Output filename (with extension .tif). The grid classification together with a metadata file (in JSON format) will be saved if filename is not NULL.

Classification rules

The Degree of Urbanisation consists of two hierarchical levels. In level 1, the cells of a 1 km² grid are classified in urban centres, urban clusters and rural cells (and water cells). In level 2, urban cluster are further divided in dense urban clusters, semi-dense urban clusters and suburbs or peri-urban cells. Rural cells are further divided in rural clusters, low density rural cells and very low density rural cells.

The detailed classification rules are as follows:

LEVEL 1:

  • Urban centres are identified as clusters of continuous grid cells (based on rook contiguity) with a minimum density of 1500 inhabitants per km² (or with a minimum built-up area; see section "Built-up area criterium" below), and a minimum total population of 50 000 inhabitants. Gaps smaller than 15 km² in the urban centres are filled and edges are smoothed by a 3x3-majority rule (see section "Edge smoothing" below).

  • Urban clusters are identified as clusters of continuous grid cells (based on queen contiguity) with a minimum density of 300 inhabitants per km², and a minimum total population of 5000 inhabitants.

  • Water cells contain no built-up area, no population, and less than 50% permanent land. All other cells not belonging to an urban centre or urban cluster are considered rural cells.

LEVEL 2:

  • Urban centres are identified as clusters of continuous grid cells (based on rook contiguity) with a minimum density of 1500 inhabitants per km² (or with a minimum built-up area; see section "Built-up area criterium" below), and a minimum total population of 50 000 inhabitants. Gaps smaller than 15 km² in the urban centres are filled and edges are smoothed by a 3x3-majority rule (see section "Edge smoothing" below).

  • Dense urban clusters are identified as clusters of continuous grid cells (based on rook contiguity) with a minimum density of 1500 inhabitants per km² (or with a minimum built-up area; see section "Built-up area criterium" below), and a minimum total population of 5000 inhabitants.

  • Semi-dense urban clusters are identified as clusters of continuous grid cells (based on rook contiguity) with a minimum density of 900 inhabitants per km², and a minimum total population of 2500 inhabitants, that are not within 2 km away from urban centres and dense urban clusters. Clusters that are within 2 km away are classified as suburban and peri-urban cells.

  • Rural clusters are clusters of continuous grid cells (based on queen contiguity) with a minimum density of 300 inhabitants per km², and a minimum total population of 500 inhabitants.

  • Low density rural cells are remaining cells with a population density less than 50 inhabitants per km².

  • Water cells contain no built-up area, no population, and less than 50% permanent land. All cells not belonging to an other class are considered very low density rural cells.

For more information about the Degree of Urbanisation methodology, see the methodological manual, GHSL Data Package 2022 and GHSL Data Package 2023.

Custom specifications

The function allows to change the standard specifications of the Degree of Urbanisation in order to construct an alternative version of the grid classification. Custom specifications can be passed in a named list by the argument parameters. The supported parameters with their default values are returned by the function DoU_get_grid_parameters() and are as follows:

LEVEL 1

  • UC_density_threshold numeric (default: 1500).

    Minimum population density per permanent land of a cell required to belong to an urban centre

  • UC_size_threshold numeric (default: 50000).

    Minimum total population size required for an urban centre

  • UC_contiguity_rule integer (default: 4).

    Which cells are considered adjacent in urban centres: 4 for rooks case (horizontal and vertical neighbours) or 8 for queens case (horizontal, vertical and diagonal neighbours)

  • UC_built_criterium logical (default: TRUE).

    Whether to use the additional built-up area criterium (see section "Built-up area criterium" below). If TRUE, not only cells that meet the population density requirement will be considered when delineating urban centres, but also cells with a built-up area per permanent land above the UC_built_threshold

  • UC_built_threshold numeric or character (default: 0.2).

    Additional built-up area threshold. Can be a value between 0 and 1, representing the minimum built-up area per permanent land, or "optimal" (see section "Built-up area criterium" below). Ignored when UC_built_criterium is FALSE.

  • built_optimal_data character / list (default: NULL).

    Path to the directory with the data, or named list with the data as returned by function DoU_preprocess_grid() used to determine the optimal built threshold (see section "Built-up area criterium" below). Ignored when UC_built_criterium is FALSE or when UC_built_threshold is not "optimal".

  • UC_smooth_pop logical (default: FALSE).

    Whether to smooth the population grid before delineating urban centres. If TRUE, the population grid will be smoothed with a moving average of window size UC_smooth_pop_window.

  • UC_smooth_pop_window integer (default: 5).

    Size of the moving window used to smooth the population grid before delineating urban centres. Ignored when UC_smooth_pop is FALSE.

  • UC_gap_fill logical (default: TRUE).

    Whether to perform gap filling. If TRUE, gaps in urban centres smaller than UC_max_gap are filled.

  • UC_max_gap integer (default: 15).

    Gaps with an area smaller than this threshold in urban centres will be filled (unit is km²). Ignored when UC_gap_fill is FALSE.

  • UC_smooth_edge logical (default: TRUE).

    Whether to perform edge smoothing. If TRUE, edges of urban centres are smoothed with the function UC_smooth_edge_fun.

  • UC_smooth_edge_fun character / function (default: "majority_rule_R2023A").

    Function used to smooth the edges of urban centres. Ignored when UC_smooth_edge is FALSE. Possible values are:

    • "majority_rule_R2022A" to use the edge smoothing algorithm in GHSL Data Package 2022 (see section "Edge smoothing" below)

    • "majority_rule_R2023A" to use the edge smoothing algorithm in GHSL Data Package 2023 (see section "Edge smoothing" below)

    • a custom function with a signature similar as apply_majority_rule().

  • UCL_density_threshold numeric (default: 300).

    Minimum population density per permanent land of a cell required to belong to an urban cluster

  • UCL_size_threshold numeric (default: 5000).

    Minimum total population size required for an urban cluster

  • UCL_contiguity_rule integer (default: 8).

    Which cells are considered adjacent in urban clusters: 4 for rooks case (horizontal and vertical neighbours) or 8 for queens case (horizontal, vertical and diagonal neighbours)

  • UCL_smooth_pop logical (default: FALSE).

    Whether to smooth the population grid before delineating urban clusters. If TRUE, the population grid will be smoothed with a moving average of window size UCL_smooth_pop_window.

  • UCL_smooth_pop_window integer (default: 5).

    Size of the moving window used to smooth the population grid before delineating urban clusters. Ignored when UCL_smooth_pop is FALSE.

  • water_land_threshold numeric (default: 0.5).

    Maximum proportion of permanent land allowed in a water cell

  • water_pop_threshold numeric (default: 0).

    Maximum population size allowed in a water cell

  • water_built_threshold numeric (default: 0).

    Maximum built-up area allowed in a water cell

LEVEL 2

  • UC_density_threshold numeric (default: 1500).

    Minimum population density per permanent land of a cell required to belong to an urban centre

  • UC_size_threshold numeric (default: 50000).

    Minimum total population size required for an urban centre

  • UC_contiguity_rule integer (default: 4).

    Which cells are considered adjacent in urban centres: 4 for rooks case (horizontal and vertical neighbours) or 8 for queens case (horizontal, vertical and diagonal neighbours)

  • UC_built_criterium logical (default: TRUE).

    Whether to use the additional built-up area criterium (see section "Built-up area criterium" below). If TRUE, not only cells that meet the population density requirement will be considered when delineating urban centres, but also cells with a built-up area per permanent land above the UC_built_threshold

  • UC_built_threshold numeric or character (default: 0.2).

    Additional built-up area threshold. Can be a value between 0 and 1, representing the minimum built-up area per permanent land, or "optimal" (see section "Built-up area criterium" below). Ignored when UC_built_criterium is FALSE.

  • built_optimal_data character / list (default: NULL).

    Path to the directory with the data, or named list with the data as returned by function DoU_preprocess_grid() used to determine the optimal built threshold (see section "Built-up area criterium" below). Ignored when UC_built_criterium is FALSE or when UC_built_threshold is not "optimal".

  • UC_smooth_pop logical (default: FALSE).

    Whether to smooth the population grid before delineating urban centres. If TRUE, the population grid will be smoothed with a moving average of window size UC_smooth_pop_window.

  • UC_smooth_pop_window integer (default: 5).

    Size of the moving window used to smooth the population grid before delineating urban centres. Ignored when UC_smooth_pop is FALSE.

  • UC_gap_fill logical (default: TRUE).

    Whether to perform gap filling. If TRUE, gaps in urban centres smaller than UC_max_gap are filled.

  • UC_max_gap integer (default: 15).

    Gaps with an area smaller than this threshold in urban centres will be filled (unit is km²). Ignored when UC_gap_fill is FALSE.

  • UC_smooth_edge logical (default: TRUE).

    Whether to perform edge smoothing. If TRUE, edges of urban centres are smoothed with the function UC_smooth_edge_fun.

  • UC_smooth_edge_fun character / function (default: "majority_rule_R2023A").

    Function used to smooth the edges of urban centres. Ignored when UC_smooth_edge is FALSE. Possible values are:

    • "majority_rule_R2022A" to use the edge smoothing algorithm in GHSL Data Package 2022 (see section "Edge smoothing" below)

    • "majority_rule_R2023A" to use the edge smoothing algorithm in GHSL Data Package 2023 (see section "Edge smoothing" below)

    • a custom function with a signature similar as apply_majority_rule().

  • DUC_density_threshold numeric (default: 1500).

    Minimum population density required for a dense urban cluster

  • DUC_size_threshold numeric (default: 5000).

    Minimum total population size required for a dense urban cluster

  • DUC_built_criterium logical (default: TRUE).

    Whether to use the additional built-up area criterium (see section "Built-up area criterium" below). If TRUE, not only cells that meet the population density requirement will be considered when delineating dense urban clusters, but also cells with a built-up area per permanent land above the DUC_built_threshold

  • DUC_built_threshold numeric or character (default: 0.2).

    Additional built-up area threshold. Can be a value between 0 and 1, representing the minimum built-up area per permanent land, or "optimal" (see section "Built-up area criterium" below). Ignored when DUC_built_criterium is FALSE.

  • DUC_contiguity_rule integer (default: 4).

    Which cells are considered adjacent in dense urban clusters: 4 for rooks case (horizontal and vertical neighbours) or 8 for queens case (horizontal, vertical and diagonal neighbours)

  • SDUC_density_threshold numeric (default: 900).

    Minimum population density per permanent land of a cell required to belong to a semi-dense urban cluster

  • SDUC_size_threshold numeric (default: 2500).

    Minimum total population size required for a semi-dense urban cluster

  • SDUC_contiguity_rule integer (default: 4).

    Which cells are considered adjacent in semi-dense urban clusters: 4 for rooks case (horizontal and vertical neighbours) or 8 for queens case (horizontal, vertical and diagonal neighbours)

  • SDUC_buffer_size integer (default: 2).

    The distance to urban centres and dense urban clusters required for a semi-dense urban cluster

  • SUrb_density_threshold numeric (default: 300).

    Minimum population density per permanent land of a cell required to belong to a suburban or peri-urban area

  • SUrb_size_threshold numeric (default: 5000).

    Minimum total population size required for a suburban or peri-urban area

  • SUrb_contiguity_rule integer (default: 8).

    Which cells are considered adjacent in suburban or peri-urban area: 4 for rooks case (horizontal and vertical neighbours) or 8 for queens case (horizontal, vertical and diagonal neighbours)

  • RC_density_threshold numeric (default: 300).

    Minimum population density per permanent land of a cell required to belong to a rural cluster

  • RC_size_threshold numeric (default: 500).

    Minimum total population size required for a rural cluster

  • RC_contiguity_rule integer (default: 8).

    Which cells are considered adjacent in rural clusters: 4 for rooks case (horizontal and vertical neighbours) or 8 for queens case (horizontal, vertical and diagonal neighbours)

  • LDR_density_threshold numeric (default: 50).

    Minimum population density per permanent land of a low density rural grid cell

  • water_land_threshold numeric (default: 0.5).

    Maximum proportion of permanent land allowed in a water cell

  • water_pop_threshold numeric (default: 0).

    Maximum population size allowed in a water cell

  • water_built_threshold numeric (default: 0).

    Maximum built-up area allowed in a water cell

Built-up area criterium

In Data Package 2022, the Degree of Urbanisation includes an optional built-up area criterium to account for the presence of office parks, shopping malls, factories and transport infrastructure. When the setting is enabled, urban centres (and dense urban clusters) are created using both cells with a population density of at least 1500 inhabitants per km² and cells that have at least 50% built-up area on permanent land. For more information: see GHSL Data Package 2022, footnote 25. The parameter settings UC_built_criterium=TRUE and UC_built_threshold=0.5 (level 1 & 2) and DUC_built_criterium=TRUE and DUC_built_threshold=0.5 (level 2) reproduce this built-up area criterium in urban centres and dense urban clusters respectively.

In Data Package 2023, the built-up area criterium is slightly adapted and renamed to the "Reduce Fragmentation Option". Instead of using a fixed threshold of built-up area per permanent land of 50%, an "optimal" threshold is employed. The optimal threshold is dynamically identified as the global average built-up area proportion in clusters with a density of at least 1500 inhabitants per permanent land with a minimum population of 5000 people. We determined empirically that this optimal threshold is 20% for the data of 2020. For more information: see GHSL Data Package 2023, footnote 30. The "Reduce Fragmentation Option" can be reproduced with the parameter settings UC_built_criterium=TRUE and UC_built_threshold="optimal" (level 1 & 2) and DUC_built_criterium=TRUE and DUC_built_threshold="optimal" (level 2). In addition, the parameter built_optimal_data must contain the path to the directory with the (global) data to compute the optimal built-up area threshold.

Edge smoothing

In Data Package 2022, edges of urban centres are smoothed by an iterative majority rule. The majority rule works as follows: if a cell has at least five of the eight surrounding cells belonging to an unique urban centre, then the cell is added to that urban centre. The process is iteratively repeated until no more cells are added. The parameter setting UC_smooth_edge=TRUE and UC_smooth_edge_fun="majority_rule_R2022A" reproduces this edge smoothing rule.

In Data Package 2023, the majority rule is slightly adapted. A cell is added to an urban centre if the majority of the surrounding cells belongs to an unique urban centre, with majority only computed among populated or land cells (proportion of permanent land > 0.5). In addition, cells with permanent water are never added to urban centres. The process is iteratively repeated until no more cells are added. For more information: see GHSL Data Package 2023, footnote 29. The parameter setting UC_smooth_edge=TRUE and UC_smooth_edge_fun="majority_rule_R2023A" reproduces this edge smoothing rule.

Regions

Because of the large amount of data at a global scale, the grid classification procedure is quite memory-consuming. To optimise the procedure, we divided the world in 9 pre-defined regions. These regions are the smallest grouping of GHSL tiles while ensuring that no continuous land mass is split into two different regions (for more information, see the figure below and GHSL_tiles_per_region).

If regions=TRUE, a global grid classification is created by (1) executing the grid classification procedure separately in the 9 pre-defined regions, and (2) afterwards merging these classifications together. The argument data should contain the path to a directory with the data of all pre-defined regions (for example as created by download_GHSLdata(... extent="regions"). Note that although the grid classification is optimised, it still takes approx. 145 minutes and requires 116 GB RAM to execute the grid classification with the standard parameters (performed on a Kubernetes server with 32 cores and 256 GB RAM). For a concrete example on how to construct the grid classification on a global scale, see vignette("vig3-DoU-global-scale").

GHSL tiles

Examples

Run this code
# load the data
data_belgium <- DoU_load_grid_data_belgium()

# classify with standard parameters:
classification1 <- DoU_classify_grid(data = data_belgium)

# \donttest{
# classify with custom parameters:
classification2 <- DoU_classify_grid(
  data = data_belgium,
  parameters = list(
    UC_density_threshold = 3000,
    UC_size_threshold = 75000,
    UC_gap_fill = FALSE,
    UC_smooth_edge = FALSE,
    UCL_contiguity_rule = 4
  )
)
# }

Run the code above in your browser using DataLab