Learn R Programming

⚠️There's a newer version (1.2.2) of this package.Take me there.

geomultistar: Multidimensional Queries Enriched with Geographic Data

Multidimensional systems allow complex queries to be carried out in an easy way. The geographical dimension, together with the temporal dimension, plays a fundamental role in multidimensional systems. Through the geomultistar package, vector layers can be associated to the attributes of geographic dimensions, so that the results of multidimensional queries can be obtained directly as vector layers. In other words, this package allows enriching multidimensional queries with geographic data.

The multidimensional structures on which we can define the queries can be created from flat tables with starschemar package or imported directly using functions from geomultistar package.

Installation

You can install the released version of geomultistar from CRAN with:

install.packages("geomultistar")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("josesamos/geomultistar")

Example

If we start from a flat table, we can generate a star schema using the starschemar package, as described in its examples.

If we have a star schema in another tool, we need to import the fact and dimension tables into R in the form of tables implemented by tibble (mrs_fact_age, mrs_fact_cause, mrs_where, mrs_when and mrs_who in the example). Once we have them in this format, we have to build a multistar structure from them: This structure can contain multiple fact and dimension tables, so facts can share dimensions. The definition for tables obtained from the case detailed in starschemar is included below. The measures of the facts are defined and the relationships between facts and dimensions are established.

library(tidyr)
library(geomultistar)

ms <- multistar() %>%
  add_facts(
    fact_name = "mrs_age",
    fact_table = mrs_fact_age,
    measures = "n_deaths",
    nrow_agg = "count"
  ) %>%
  add_facts(
    fact_name = "mrs_cause",
    fact_table = mrs_fact_cause,
    measures = c("pneumonia_and_influenza_deaths", "other_deaths"),
    nrow_agg = "nrow_agg"
  ) %>%
  add_dimension(
    dimension_name = "where",
    dimension_table = mrs_where,
    dimension_key = "where_pk",
    fact_name = "mrs_age",
    fact_key = "where_fk"
  ) %>%
  add_dimension(
    dimension_name = "when",
    dimension_table = mrs_when,
    dimension_key = "when_pk",
    fact_name = "mrs_age",
    fact_key = "when_fk",
    key_as_data = TRUE
  ) %>%
  add_dimension(
    dimension_name = "who",
    dimension_table = mrs_who,
    dimension_key = "who_pk",
    fact_name = "mrs_age",
    fact_key = "who_fk"
  ) %>%
  relate_dimension(dimension_name = "where",
                   fact_name = "mrs_cause",
                   fact_key = "where_fk") %>%
  relate_dimension(dimension_name = "when",
                   fact_name = "mrs_cause",
                   fact_key = "when_fk")

Once we have a multistar structure, we will associate vector layers to the attributes of the geographic dimensions. We can use existing layers or generate them from the previous definitions. As a result we will have a geomultistar structure.

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1

gms <-
  geomultistar(ms, geodimension = "where") %>%
  define_geoattribute(
    attribute = "city",
    from_layer = usa_cities,
    by = c("city" = "city", "state" = "state")
  ) %>%
  define_geoattribute(
    attribute = "county",
    from_layer = usa_counties,
    by = c("county" = "county", "state" = "state")
  )  %>%
  define_geoattribute(
    attribute = c("state", "state_name"),
    from_layer = usa_states,
    by = c("state" = "state")
  ) %>%
  define_geoattribute(from_attribute = "state")

In the last definition, because no geographic attribute is specified, the rest of the dimension’s attributes are automatically defined from the layer associated with the indicated parameter.

Finally, we can define multidimensional queries on this structure using the functions available in the starschemar package. When executing these queries, using the functionality implemented in package geomultistar, the vector layers of the attributes will be taken into account to result in a new vector layer.

library(starschemar)

gdqr <- dimensional_query(gms) %>%
  select_dimension(name = "where",
                   attributes = c("division_name", "region_name")) %>%
  select_dimension(name = "when",
                   attributes = c("year", "week")) %>%
  select_fact(name = "mrs_age",
              measures = c("n_deaths")) %>%
  select_fact(
    name = "mrs_cause",
    measures = c("pneumonia_and_influenza_deaths", "other_deaths")
  ) %>%
  filter_dimension(name = "when", week <= "03") %>%
  run_geoquery()

The result is a vector layer that we can save or we can see it as a map, using the functions associated with the sf class.

class(gdqr)
#> [1] "sf"         "tbl_df"     "tbl"        "data.frame"

plot(gdqr[,"n_deaths"])

Although we have indicated in the query the attributes division_name and region_name, as can be seen in the figure, the result obtained is at the finest granularity level, in this case at the division_name level.

Only the parts of the divisions made up of states where there is recorded data are shown. If we wanted to show the full extent of each division, we should have explicitly associated a layer at that level.

Copy Link

Version

Install

install.packages('geomultistar')

Monthly Downloads

174

Version

1.0.0

License

MIT + file LICENSE

Maintainer

Jose Samos

Last Published

September 30th, 2020

Functions in geomultistar (1.0.0)

mrs_fact_cause

Fact cause
add_dimension

Add a dimension table to a multistar
mrs_fact_age

Fact age
usa_divisions

USA Divisions, 2018
mrs_when

Dimension when
add_facts

Add a fact table to a multistar
usa_nation

USA Nation, 2018
usa_cities

USA Cities, 2014
mrs_where

Dimension where
usa_counties

USA Counties, 2018
mrs_who

Dimension who
usa_regions

USA Regions, 2018
multistar

multistar S3 class
usa_states

USA States, 2018
geomultistar

geomultistar S3 class
relate_dimension

Relate a dimension table to a fact table in a multistar
filter_geodimension

Filter geodimension
run_geoquery

Get a geographic vector from a query
default_attribute

Default attribute
define_geoattribute

Define geographic attributes
specify_geodimension

Default attribute
%>%

Define a geoattribute from another
uk_london_boroughs

UK London Boroughs
get_empty_geoinstances

Get empty instances of a geographic attribute
new_multistar

multistar S3 class
new_geomultistar

geomultistar S3 class