Learn R Programming

starschemar

The multidimensional data model was defined in the 1990s with the aim of supporting data analysis. Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure.

Transformations can be carried out using professional ETL (extract, transform and load) tools. Recently, tools aimed at end users have emerged, which are also aimed at performing transformation operations. All these tools are very useful to carry out the transformation process, they provide a development environment to define the transformation operations in a general way.

Frequently, the operations to be performed aim to transform a flat table (with data that comes from operational systems) into a star schema (which implements a multidimensional system). With the tools mentioned above, this transformation can be carried out, but it requires a lot of work. I am not aware of any tools with operations designed to specifically support this transformation process.

The goal of starschemar is to define transformations that allow you to easily obtain star schemas from flat tables. In addition, it includes basic data cleaning, dimension enrichment, incremental data refresh and query operations, adapted to this context.

Based on this package, the rolap package has been developed. It incorporates the main functionalities for which starschemar was initially intended and it has a new data model defined with the aim of facilitating its maintenance and evolution.

Installation

You can install the released version of starschemar from CRAN with:

install.packages("starschemar")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("josesamos/starschemar")

Example

To illustrate how the package works we will use a small part of the Deaths in 122 U.S. cities - 1962-2016. 122 Cities Mortality Reporting System data set in the form of a flat table, the first rows of which are shown below.

YearWEEKWeek Ending DateREGIONStateCityPneumonia and Influenza DeathsAll DeathsOther Deaths
196211962-01-061CTBridgeport34643
196221962-01-131CTBridgeport24341
196231962-01-201CTBridgepor24038
196211962-01-069WATacoma45046
NANA1962-01-139WATacoma24543
196231962-01-209WATacoma03939

The original data has been modified to have some erroneous or missing data.

The transformation to obtain a star schema from the flat table using starschemar package is as follows:

library(starschemar)

# columns to consider in the definition
dput(colnames(ft))
#> c("Year", "WEEK", "Week Ending Date", "REGION", "State", "City", 
#> "Pneumonia and Influenza Deaths", "All Deaths", "Other Deaths"
#> )

dm <- dimensional_model() |>
  define_fact(
    name = "mrs_cause",
    measures = c(
      "Pneumonia and Influenza Deaths",
      "Other Deaths"
    ),
  ) |>
  define_dimension(
    name = "when",
    attributes = c(
      "Week Ending Date",
      "WEEK",
      "Year"
    )
  ) |>
  define_dimension(
    name = "where",
    attributes = c(
      "REGION",
      "State",
      "City"
    )
  )

st <- star_schema(ft, dm) |>
  snake_case() |>
  character_dimensions(
    NA_replacement_value = "Unknown",
    length_integers = list(week = 2)
  )

The tables of dimensions and facts of the obtained star schema are shown below.

when_keyweek_ending_dateweekyear
11962-01-06011962
21962-01-13021962
31962-01-13UnknownUnknown
41962-01-20031962
where_keyregionstatecity
11CTBridgepor
21CTBridgeport
39WATacoma
when_keywhere_keypneumonia_and_influenza_deathsother_deathsnrow_agg
123431
134461
222411
332431
412381
430391

The tables show the erroneous and missing data. We are going to perform some data cleaning operations to correct them.

where <- st |>
  get_dimension("where")

when <- st |>
  get_dimension("when")

updates <- record_update_set() |>
  update_selection(
    dimension = where,
    columns = c("city"),
    old_values = c("Bridgepor"),
    new_values = c("Bridgeport")
  ) |>
  match_records(dimension = when,
                old = 3,
                new = 2)

st <- st |>
  modify_dimension_records(updates)

The new dimension and fact tables are shown below.

when_keyweek_ending_dateweekyear
11962-01-06011962
21962-01-13021962
31962-01-20031962
where_keyregionstatecity
11CTBridgeport
29WATacoma
where_keywhen_keypneumonia_and_influenza_deathsother_deathsnrow_agg
113431
122411
132381
214461
222431
230391

In addition to the operations in the examples shown, starschemar offers support for defining role playing and role dimensions in a star schema, fact constellations with conformed dimensions, dimension enrichment, incremental data refresh operations, the ability to export results in various tibble-based formats, and query operations, adapted to this context.

Copy Link

Version

Install

install.packages('starschemar')

Monthly Downloads

632

Version

1.2.4

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Jose Samos

Last Published

January 8th, 2024

Functions in starschemar (1.2.4)

ft_london_boroughs

London Boroughs
dimensional_query

dimensional_query S3 class
find_values

Find values in a dimension
get_role_dimension_names

Get role dimension names associated to a role-playing dimension
get_name_of_uniquely_implemented_dimensions

Get name of uniquely implemented dimensions
is_conformed_dimension

Is it conformed dimension?
filter_fact_rows

Filter fact rows
get_conformed_dimension

Get conformed dimension
filter_selected_instances

Filter selected instances
ft_usa_states

USA States
get_attribute_names

Get attribute names
ft_datagov_uk

Modelling the long-term health impacts of air pollution in London
define_selected_dimensions

Define selected dimensions
incremental_refresh_star_schema

Incrementally refresh a star schema with another
get_all_dimensions

Get all dimensions
get_star_schema

Get star schema
get_dimension_names

Get dimension names
match_records

Make a dimension record equal to another
homogenize

Homogenize a dimension
get_fact_name

Get fact name
ft_usa_city_county

USA City and County
get_dimension_name

Get the dimension name
get_role_playing_dimension_name

Get the associated role-playing dimension name
is_role_playing_dimension

Is it role-playing dimension?
incremental_refresh_constellation

Incrementally refresh a constellation with a star schema
enrich_dimension_export

Export selected attributes of a dimension
enrich_dimension_import

Import tibble to enrich a dimension
filter_dimension

Filter dimension
get_conformed_dimension_names

Get conformed dimension names
get_star_schema_names

Get star schema names
enrich_dimension_import_test

Import tibble to test to enrich a dimension
mrs_age_w11

Mortality Reporting System by Age for Week 11
mrs_cause_w_test

Mortality Reporting System by Cause for Week Test
get_dimension_type

Get the dimension type
new_star_schema

star_schema S3 class
incremental_refresh_fact

Incrementally refresh a fact table with another
mrs_age_w_test

Mortality Reporting System by Age for Week Test
get_dimension

Get dimension
incremental_refresh_dimension

Incrementally refresh a dimension with another
mrs_cause

Mortality Reporting System by Cause
get_conformed_dimension_names_st

Get conformed dimension names
mrs

Mortality Reporting System
mrs_cause_test

Mortality Reporting System by Cause Test
get_dimension_attribute_names

Get dimension attribute names
purge_dimensions_constellation

Purge dimensions in a constellation
ms_mrs

Multistar for Mortality Reporting System
mrs_age

Mortality Reporting System by Age
new_dimensional_model

dimensional_model S3 class
record_update_set

record_update_set S3 class
role_playing_dimension

Define a role playing dimension in a star_schema object
purge_dimensions_star_schema

Purge dimensions
modify_conformed_dimension_records

Apply dimension record update operations to conformed dimensions
modify_dimension_records

Apply dimension record update operations
group_records

Group records
group_table

Group the records in the table
prepare_join

Transform a tibble to join
group_facts

Group facts
new_dimensional_query

dimensional_query S3 class
role_dimension

Transform a dimension into a role dimension
snake_case_fact

Transform names according to the snake case style in a fact table
new_fact_table

fact_table S3 class
st_mrs_age

Star Schema for Mortality Reporting System by Age
mrs_cause_w11

Mortality Reporting System by Cause for Week 11
mrs_cause_w10

Mortality Reporting System by Cause for Week 10
remove_duplicate_dimension_rows

Remove duplicate dimension rows
reference_dimension

Reference a dimension
replace_records

Replace records
new_constellation

constellation S3 class
get_measure_names

Get measure names
st_mrs_age_w_test

Star Schema for Mortality Reporting System by Age for Week Test
st_mrs_age_w11

Star Schema for Mortality Reporting System by Age for Week 11
is_dimension_in_updates

Is dimension in set of updates?
get_name_of_role_playing_dimensions

Get the name of the role playing dimensions
replace_role_dimension_in_facts

Replace in facts a star schema role dimension
star_schema_as_flat_table

Export a star schema as a flat table
new_multistar

multistar S3 class
is_role_dimension

Is it role dimension?
star_schema_as_mst

Star schema as multistar export (common)
replace_dimension

Replace a star schema dimension
mrs_age_test

Mortality Reporting System by Age Test
new_record_update

record_update S3 class
new_dimension_table

dimension_table S3 class
rename_fact

Rename fact
select_dimension

Select dimension
rename_measures

Rename measures
run_query

Run query
rename_dimension_attributes

Rename dimension attributes
typed_value

Transform a value according to its type
snake_case

Transform names according to the snake case style
mrs_age_w10

Mortality Reporting System by Age for Week 10
ms_mrs_test

Multistar for Mortality Reporting System Test
update_facts_with_general_dimension

Update facts with a general dimension
update_facts_with_role_dimension

Update facts with a role dimension
updates_st_mrs_age_test

Updates for the Star Schema for Mortality Reporting System by Age Test
validate_names

Validate names
multistar_as_flat_table

Export a multistar as a flat table
snake_case_dimension

Transform names according to the snake case style in a dimension
st_mrs_cause_w_test

Star Schema for Mortality Reporting System by Cause for Week Test
st_mrs_cause_w11

Star Schema for Mortality Reporting System by Cause for Week 11
st_mrs_age_w10

Star Schema for Mortality Reporting System by Age for Week 10
set_dimension_name

Set the dimension name
st_mrs_cause_w10

Star Schema for Mortality Reporting System by Cause for Week 10
st_mrs_age_test

Star Schema for Mortality Reporting System by Age Test
set_dimension_type

Set the dimension type
star_schema

star_schema S3 class
starschemar

Obtaining Star Schemas from Flat Tables
set_dimension_type_role_playing

Set the type of a role-playing dimension
update_dimensions

Apply update operations to dimensions
update_facts_with_dimensions

Update facts with a list of modified dimensions
new_record_update_set

record_update_set S3 class
replace_dimension_in_facts

Replace in facts a star schema dimension
replace_general_dimension_in_facts

Replace in facts a star schema general dimension
union_of_dimensions

Perform union of dimensions
rename_dimension

Rename dimension
update_dimension

Apply dimension record update operations to a dimension
unify_facts_by_grain

Unify facts by grain
set_dimension_type_conformed

Set the type of a conformed dimension
star_schema_as_tl

Export a star schema as a tibble list (common)
update_record

Update a dimension record with a set of values
update_selection

Update dimension records with a set of values
select_fact

Select fact
set_role_playing_dimension_name

Set the associated role-playing dimension name
update_selection_general

Update dimension records with a set of values in given columns
star_schema_as_tibble_list

Export a star schema as a tibble list
selection_bit_map

Generate a record selection bitmap
set_fact_name

Set fact name
star_schema_as_multistar

Export a star schema as a multistar
st_mrs_cause

Star Schema for Mortality Reporting System by Cause
st_mrs_cause_test

Star Schema for Mortality Reporting System by Cause Test
updates_st_mrs_age

Updates for the Star Schema for Mortality Reporting System by Age
constellation_as_multistar

Export a constellation as a multistar
constellation

constellation S3 class
define_dimension

Define dimensions in a dimensional_model object
constellation_as_tibble_list

Export a constellation as a tibble list
ct_mrs

Constellation for Mortality Reporting System
character_dimensions

Transform dimension numeric attributes to character
dimensional_model

dimensional_model S3 class
conform_all_dimensions

Conform all dimensions of a constellation
ct_mrs_test

Constellation for Mortality Reporting System Test
delete_records

Delete records
define_selected_facts

Define selected facts
define_fact

Define facts in a dimensional_model object
conform_dimensions

Conform dimensions of given name
character_dimension

Transform a dimension numeric attributes to character
dm_mrs_cause

Star Definition for Mortality Reporting System by Cause
dm_mrs_age

Star Definition for Mortality Reporting System by Age
delete_unused_foreign_keys

Delete unused foreign keys
dereference_dimension

Dereference a dimension