Learn R Programming

⚠️There's a newer version (1.2.4) of this package.Take me there.

starschemar

The multidimensional data model was defined in the 1990s with the aim of supporting data analysis. Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure.

Transformations can be carried out using professional ETL (extract, transform and load) tools. Recently, tools aimed at end users have emerged, which are also aimed at performing transformation operations. All these tools are very useful to carry out the transformation process, they provide a development environment to define the transformation operations in a general way.

Frequently, the operations to be performed aim to transform a flat table (with data that comes from operational systems) into a star schema (which implements a multidimensional system). With the tools mentioned above, this transformation can be carried out, but it requires a lot of work. I am not aware of any tools with operations designed to specifically support this transformation process.

The goal of starschemar is to define transformations that allow you to easily obtain star schemas from flat tables. In addition, it includes basic data cleaning, dimension enrichment, incremental data refresh and query operations, adapted to this context.

Installation

You can install the released version of starschemar from CRAN with:

install.packages("starschemar")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("josesamos/starschemar")

Example

To illustrate how the package works we will use a small part of the Deaths in 122 U.S. cities - 1962-2016. 122 Cities Mortality Reporting System data set in the form of a flat table, the first rows of which are shown below.

YearWEEKWeek Ending DateREGIONStateCityPneumonia and Influenza DeathsAll DeathsOther Deaths
196211962-01-061CTBridgeport34643
196221962-01-131CTBridgeport24341
196231962-01-201CTBridgepor24038
196211962-01-069WATacoma45046
NANA1962-01-139WATacoma24543
196231962-01-209WATacoma03939

The original data has been modified to have some erroneous or missing data.

The transformation to obtain a star schema from the flat table using starschemar package is as follows:

library(starschemar)
library(tidyr)

# columns to consider in the definition
dput(colnames(ft))
#> c("Year", "WEEK", "Week Ending Date", "REGION", "State", "City", 
#> "Pneumonia and Influenza Deaths", "All Deaths", "Other Deaths"
#> )

dm <- dimensional_model() %>%
  define_fact(
    name = "mrs_cause",
    measures = c(
      "Pneumonia and Influenza Deaths",
      "Other Deaths"
    ),
  ) %>%
  define_dimension(
    name = "when",
    attributes = c(
      "Week Ending Date",
      "WEEK",
      "Year"
    )
  ) %>%
  define_dimension(
    name = "where",
    attributes = c(
      "REGION",
      "State",
      "City"
    )
  )

st <- star_schema(ft, dm) %>%
  snake_case() %>%
  character_dimensions(
    NA_replacement_value = "Unknown",
    length_integers = list(week = 2)
  )

The tables of dimensions and facts of the obtained star schema are shown below.

when_keyweek_ending_dateweekyear
11962-01-06011962
21962-01-13021962
31962-01-13UnknownUnknown
41962-01-20031962
where_keyregionstatecity
11CTBridgepor
21CTBridgeport
39WATacoma
when_keywhere_keypneumonia_and_influenza_deathsother_deathsnrow_agg
123431
134461
222411
332431
412381
430391

The tables show the erroneous and missing data. We are going to perform some data cleaning operations to correct them.

where <- st %>%
  get_dimension("where")

when <- st %>%
  get_dimension("when")

updates <- record_update_set() %>%
  update_selection(
    dimension = where,
    columns = c("city"),
    old_values = c("Bridgepor"),
    new_values = c("Bridgeport")
  ) %>%
  match_records(dimension = when,
                old = 3,
                new = 2)

st <- st %>%
  modify_dimension_records(updates)

The new dimension and fact tables are shown below.

when_keyweek_ending_dateweekyear
11962-01-06011962
21962-01-13021962
31962-01-20031962
where_keyregionstatecity
11CTBridgeport
29WATacoma
where_keywhen_keypneumonia_and_influenza_deathsother_deathsnrow_agg
113431
122411
132381
214461
222431
230391

In addition to the operations in the examples shown, starschemar offers support for defining role playing and role dimensions in a star schema, fact constellations with conformed dimensions, dimension enrichment, incremental data refresh operations, the ability to export results in various tibble-based formats, and query operations, adapted to this context.

Copy Link

Version

Install

install.packages('starschemar')

Monthly Downloads

640

Version

1.1.0

License

MIT + file LICENSE

Maintainer

Jose Samos

Last Published

September 13th, 2020

Functions in starschemar (1.1.0)

conform_all_dimensions

Conform all dimensions of a constellation
constellation_as_tibble_list

Export a constellation as a tibble list
constellation

constellation S3 class
character_dimensions

Transform dimension numeric attributes to character
constellation_as_multistar

Export a constellation as a multistar
conform_dimensions

Conform dimensions of given name
character_dimension

Transform a dimension numeric attributes to character
ct_mrs_test

Constellation for Mortality Reporting System Test
ct_mrs

Constellation for Mortality Reporting System
define_selected_facts

Define selected facts
dereference_dimension

Dereference a dimension
delete_unused_foreign_keys

Delete unused foreign keys
filter_selected_instances

Filter selected instances
delete_records

Delete records
get_conformed_dimension_names

Get conformed dimension names
dimensional_query

dimensional_query S3 class
enrich_dimension_import

Import tibble to enrich a dimension
dimensional_model

dimensional_model S3 class
find_values

Find values in a dimension
reference_dimension

Reference a dimension
define_dimension

Define dimensions in a dimensional_model object
get_conformed_dimension_names_st

Get conformed dimension names
define_fact

Define facts in a dimensional_model object
homogenize

Homogenize a dimension
enrich_dimension_export

Export selected attributes of a dimension
filter_fact_rows

Filter fact rows
filter_dimension

Filter dimension
match_records

Make a dimension record equal to another
define_selected_dimensions

Define selected dimensions
get_attribute_names

Get attribute names
get_name_of_uniquely_implemented_dimensions

Get name of uniquely implemented dimensions
role_dimension

Transform a dimension into a role dimension
get_dimension_name

Get the dimension name
get_dimension_names

Get dimension names
group_facts

Group facts
get_measure_names

Get measure names
select_fact

Select fact
mrs_cause_w11

Mortality Reporting System by Cause for Week 11
new_dimensional_query

dimensional_query S3 class
get_name_of_role_playing_dimensions

Get the name of the role playing dimensions
get_role_playing_dimension_name

Get the associated role-playing dimension name
get_dimension_attribute_names

Get dimension attribute names
set_dimension_type_conformed

Set the type of a conformed dimension
incremental_refresh_star_schema

Incrementally refresh a star schema with another
group_table

Group the records in the table
get_conformed_dimension

Get conformed dimension
dm_mrs_age

Star Definition for Mortality Reporting System by Age
get_dimension

Get dimension
group_records

Group records
mrs_cause

Mortality Reporting System by Cause
get_all_dimensions

Get all dimensions
dm_mrs_cause

Star Definition for Mortality Reporting System by Cause
record_update_set

record_update_set S3 class
get_dimension_type

Get the dimension type
is_conformed_dimension

Is it conformed dimension?
prepare_join

Transform a tibble to join
ft_datagov_uk

Modelling the long-term health impacts of air pollution in London
incremental_refresh_constellation

Incrementally refresh a constellation with a star schema
mrs_age_w11

Mortality Reporting System by Age for Week 11
modify_dimension_records

Apply dimension record update operations
mrs_cause_w10

Mortality Reporting System by Cause for Week 10
get_fact_name

Get fact name
update_facts_with_general_dimension

Update facts with a general dimension
rename_measures

Rename measures
mrs_cause_test

Mortality Reporting System by Cause Test
mrs

Mortality Reporting System
get_role_dimension_names

Get role dimension names associated to a role-playing dimension
snake_case_fact

Transform names according to the snake case style in a fact table
incremental_refresh_fact

Incrementally refresh a fact table with another
is_role_playing_dimension

Is it role-playing dimension?
replace_dimension_in_facts

Replace in facts a star schema dimension
new_dimensional_model

dimensional_model S3 class
new_constellation

constellation S3 class
mrs_age_w_test

Mortality Reporting System by Age for Week Test
st_mrs_age_w_test

Star Schema for Mortality Reporting System by Age for Week Test
purge_dimensions_constellation

Purge dimensions in a constellation
union_of_dimensions

Perform union of dimensions
st_mrs_age_test

Star Schema for Mortality Reporting System by Age Test
replace_general_dimension_in_facts

Replace in facts a star schema general dimension
st_mrs_cause_w_test

Star Schema for Mortality Reporting System by Cause for Week Test
st_mrs_age_w10

Star Schema for Mortality Reporting System by Age for Week 10
mrs_age

Mortality Reporting System by Age
st_mrs_cause_w10

Star Schema for Mortality Reporting System by Cause for Week 10
incremental_refresh_dimension

Incrementally refresh a dimension with another
set_dimension_name

Set the dimension name
selection_bit_map

Generate a record selection bitmap
rename_dimension_attributes

Rename dimension attributes
set_role_playing_dimension_name

Set the associated role-playing dimension name
st_mrs_cause_w11

Star Schema for Mortality Reporting System by Cause for Week 11
new_record_update_set

record_update_set S3 class
new_record_update

record_update S3 class
new_dimension_table

dimension_table S3 class
rename_fact

Rename fact
mrs_age_test

Mortality Reporting System by Age Test
role_playing_dimension

Define a role playing dimension in a star_schema object
updates_st_mrs_age

Updates for the Star Schema for Mortality Reporting System by Age
modify_conformed_dimension_records

Apply dimension record update operations to conformed dimensions
update_dimensions

Apply update operations to dimensions
is_dimension_in_updates

Is dimension in set of updates?
star_schema

star_schema S3 class
is_role_dimension

Is it role dimension?
ms_mrs_test

Multistar for Mortality Reporting System Test
update_dimension

Apply dimension record update operations to a dimension
purge_dimensions_star_schema

Purge dimensions
new_fact_table

fact_table S3 class
run_query

Run query
remove_duplicate_dimension_rows

Remove duplicate dimension rows
replace_dimension

Replace a star schema dimension
new_multistar

multistar S3 class
mrs_age_w10

Mortality Reporting System by Age for Week 10
rename_dimension

Rename dimension
star_schema_as_tl

Export a star schema as a tibble list (common)
select_dimension

Select dimension
snake_case

Transform names according to the snake case style
st_mrs_age_w11

Star Schema for Mortality Reporting System by Age for Week 11
set_fact_name

Set fact name
star_schema_as_multistar

Export a star schema as a multistar
ms_mrs

Multistar for Mortality Reporting System
mrs_cause_w_test

Mortality Reporting System by Cause for Week Test
set_dimension_type_role_playing

Set the type of a role-playing dimension
update_facts_with_role_dimension

Update facts with a role dimension
snake_case_dimension

Transform names according to the snake case style in a dimension
typed_value

Transform a value according to its type
new_star_schema

star_schema S3 class
star_schema_as_tibble_list

Export a star schema as a tibble list
unify_facts_by_grain

Unify facts by grain
replace_records

Replace records
multistar_as_flat_table

Export a multistar as a flat table
set_dimension_type

Set the dimension type
update_record

Update a dimension record with a set of values
replace_role_dimension_in_facts

Replace in facts a star schema role dimension
st_mrs_age

Star Schema for Mortality Reporting System by Age
st_mrs_cause

Star Schema for Mortality Reporting System by Cause
update_facts_with_dimensions

Update facts with a list of modified dimensions
update_selection

Update dimension records with a set of values
starschemar

Obtaining Star Schemas from Flat Tables
st_mrs_cause_test

Star Schema for Mortality Reporting System by Cause Test
star_schema_as_flat_table

Export a star schema as a flat table
star_schema_as_mst

Star schema as multistar export (common)
update_selection_general

Update dimension records with a set of values in given columns
updates_st_mrs_age_test

Updates for the Star Schema for Mortality Reporting System by Age Test