Learn R Programming

geodimension

The geographic dimension plays a fundamental role in multidimensional systems. To define a geographic dimension in a multidimensional star schema, we need a table with attributes corresponding to the levels of the dimension. Additionally, we will also need one or more geographic layers to represent the data using this dimension.

We can obtain this data from available vector layers of geographic information. In simple cases, one layer is enough. We often need several layers related to each other. The relationships can be defined by common attribute values or can be inferred from the respective geographic information.

The goal of geodimension is to support the definition of geographic dimensions from layers of geographic information that can be used in multidimensional systems. In particular, through packages rolap and geomultistar.

Installation

You can install the released version of geodimension from CRAN with:

install.packages("geodimension")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("josesamos/geodimension")

Example

This is a basic example which shows you how to generate a geodimension from tables and vector layers of geographic information. It also shows how to use it.

Suppose that, for the US, we want to define a geographic dimension at the state level but also include the information at the predefined higher organization levels: division, region and country, available in the package in the us_division variable, shown below.

division_codedivision_nameregion_coderegion_namecountry
1New England1NortheastUSA
2Middle Atlantic1NortheastUSA
3East North Central2MidwestUSA
4West North Central2MidwestUSA
5South Atlantic3SouthUSA
6East South Central3SouthUSA
7West South Central3SouthUSA
8Mountain4WestUSA
9Pacific4WestUSA
0Puerto Rico9Puerto RicoUSA

In United States Census Bureau we find layers at various levels of detail, including state. We get a geographic layer for state level (layer_us_state). For this example we obtain it from the package itself (we could read it from a GeoPackage or in any other format using the sf package).

library(geodimension)

layer_us_state <- gd_us |>
  get_level_layer("state")

plot(sf::st_shift_longitude(sf::st_geometry(layer_us_state)))

From it we can define all the levels. From each layer, we define a geolevel.

state <-
  geolevel(name = "state",
           layer = layer_us_state,
           key = "statefp")

division <-
  geolevel(
    name = "division",
    layer = us_division,
    attributes = c("country", "region_code", "division_name"),
    key = "division_code"
  ) |>
  add_geometry(layer = layer_us_state,
               layer_key = "division")

region <-
  geolevel(
    name = "region",
    layer = us_division,
    attributes = c("country", "region_name"),
    key = "region_code"
  ) |>
  add_geometry(layer = layer_us_state,
               layer_key = "region")

country <-
  geolevel(
    name = "country",
    layer = get_level_layer(region),
    attributes = "country",
    key = "country"
  )

We define a geodimension that includes all the levels in which we are interested.

gd <-
  geodimension(name = "gd_us",
               level = state,
               snake_case = TRUE) |>
  add_level(level = division) |>
  add_level(level = region) |>
  add_level(level = country)

Next, we define the relationships that exist between the levels: some based on common attributes, others on geographic relationships between their instances.

gd <- gd |>
  relate_levels(
    lower_level_name = "state",
    lower_level_attributes = "division",
    upper_level_name = "division"
  ) |>
  relate_levels(
    lower_level_name = "division",
    upper_level_name = "region",
    by_geography = TRUE
  ) |>
  relate_levels(
    lower_level_name = "region",
    lower_level_attributes = "country",
    upper_level_name = "country"
  )

There are no restrictions on the relationships we define, as long as the relationship can be established.

With these operations we have defined a geodimension. From it we can obtain a data table to define a dimension in a star schema or the layer or layers associated with that table at the level we need. We can also get a table with latitude and longitude defined as fields.

ld <- gd |>
  get_level_data(level_name = "division")
names(ld)
#> [1] "division_code"  "country"        "region_code"    "division_name" 
#> [5] "fk_region_code"

ld <- gd |>
  get_level_data(level_name = "division",
                 inherited = TRUE)
names(ld)
#> [1] "division_code"           "division_country"       
#> [3] "division_region_code"    "division_name"          
#> [5] "division_fk_region_code" "region_country"         
#> [7] "region_name"

ll <- gd |>
  get_level_layer(level_name = "division",
                 inherited = TRUE)
names(ll)
#> [1] "division_code"           "division_country"       
#> [3] "division_region_code"    "division_name"          
#> [5] "division_fk_region_code" "region_country"         
#> [7] "region_name"             "geom"

lg <- gd |>
  get_level_data_geo(level_name = "division",
                     inherited = TRUE)
names(lg)
#> [1] "division_code"           "division_country"       
#> [3] "division_region_code"    "division_name"          
#> [5] "division_fk_region_code" "region_country"         
#> [7] "region_name"             "intptlon"               
#> [9] "intptlat"

If we need the data at another level of detail, we can obtain it in a similar way.

ld <- gd |>
  get_level_data(level_name = "region",
                 inherited = TRUE)
names(ld)
#> [1] "region_code"    "region_country" "region_name"

ll <- gd |>
  get_level_layer(level_name = "region",
                  only_key = TRUE)

plot(sf::st_shift_longitude(ll))

In addition to these functions, the package offers other support functions to aid in the definition of levels (for example, to determine the key attributes of a layer), to relate instances of levels whose relationship is not immediately established, or to configure the geodimension to obtain a customized output.

For example, we can obtain a table with level data and geographic data represented in the form of points, with longitude and latitude, to be included in other tools that use this format.

ld_geo <- gd |>
  get_level_data_geo(level_name = "region")

pander::pandoc.table(ld_geo, split.table = Inf)
region_codecountryregion_nameintptlonintptlat
1USANortheast-74.7943.27
2USAMidwest-93.1943.21
3USASouth-91.2932.9
4USAWest-113.240.77
9USAPuerto Rico-66.2818.21

Copy Link

Version

Install

install.packages('geodimension')

Monthly Downloads

248

Version

2.0.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Jose Samos

Last Published

January 9th, 2024

Functions in geodimension (2.0.0)

get_higher_level_names

Get higher level names
geolevel

geolevel S3 class
get_empty_geometry_instances

Get empty geometry instances
get_level_geometries

Get level geometries
get_level_data_geo

Get level data with latitude and longitude
gd_us

gd_us
get_geometry

Get geometry
get_level_data

Get level data
geodimension

geodimension S3 class
get_level_keys

get level keys
get_unrelated_instances

Get unrelated instances
set_level_data

Set level data
select_levels

Select levels
transform_crs

Transform CRS
get_level_names

Get level names
get_level_layer

Get level layer
sort_by_number_of_instances

sort by number of instances
my_to_snake_case

To snake case
relate_levels

Relate levels in a dimension
snake_case_geolevel

snake case geolevel
validate_names

Validate names
us_division

us_division
add_geometry

Add geometry to a level
add_prefix

Add prefix
coordinates_to_geometry

Transform coordinates to point geometry
check_key

Check key
complete_point_geometry

Complete point geometry
add_level

Add a level to a dimension
complete_relation_by_geography

Complete relation by geography
define_relationship

define relationship
gd_es

gd_es
all_attributes_character

All attributes are character