Learn R Programming

⚠️There's a newer version (1.2.4) of this package.Take me there.

Notice

Based on this package, the rolap package has been developed. It incorporates the main functionalities for which starschemar was initially intended and it has a new data model defined with the aim of facilitating its maintenance and evolution.

starschemar

The multidimensional data model was defined in the 1990s with the aim of supporting data analysis. Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure.

Transformations can be carried out using professional ETL (extract, transform and load) tools. Recently, tools aimed at end users have emerged, which are also aimed at performing transformation operations. All these tools are very useful to carry out the transformation process, they provide a development environment to define the transformation operations in a general way.

Frequently, the operations to be performed aim to transform a flat table (with data that comes from operational systems) into a star schema (which implements a multidimensional system). With the tools mentioned above, this transformation can be carried out, but it requires a lot of work. I am not aware of any tools with operations designed to specifically support this transformation process.

The goal of starschemar is to define transformations that allow you to easily obtain star schemas from flat tables. In addition, it includes basic data cleaning, dimension enrichment, incremental data refresh and query operations, adapted to this context.

Installation

You can install the released version of starschemar from CRAN with:

install.packages("starschemar")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("josesamos/starschemar")

Example

To illustrate how the package works we will use a small part of the Deaths in 122 U.S. cities - 1962-2016. 122 Cities Mortality Reporting System data set in the form of a flat table, the first rows of which are shown below.

YearWEEKWeek Ending DateREGIONStateCityPneumonia and Influenza DeathsAll DeathsOther Deaths
196211962-01-061CTBridgeport34643
196221962-01-131CTBridgeport24341
196231962-01-201CTBridgepor24038
196211962-01-069WATacoma45046
NANA1962-01-139WATacoma24543
196231962-01-209WATacoma03939

The original data has been modified to have some erroneous or missing data.

The transformation to obtain a star schema from the flat table using starschemar package is as follows:

library(starschemar)

# columns to consider in the definition
dput(colnames(ft))
#> c("Year", "WEEK", "Week Ending Date", "REGION", "State", "City", 
#> "Pneumonia and Influenza Deaths", "All Deaths", "Other Deaths"
#> )

dm <- dimensional_model() |>
  define_fact(
    name = "mrs_cause",
    measures = c(
      "Pneumonia and Influenza Deaths",
      "Other Deaths"
    ),
  ) |>
  define_dimension(
    name = "when",
    attributes = c(
      "Week Ending Date",
      "WEEK",
      "Year"
    )
  ) |>
  define_dimension(
    name = "where",
    attributes = c(
      "REGION",
      "State",
      "City"
    )
  )

st <- star_schema(ft, dm) |>
  snake_case() |>
  character_dimensions(
    NA_replacement_value = "Unknown",
    length_integers = list(week = 2)
  )

The tables of dimensions and facts of the obtained star schema are shown below.

when_keyweek_ending_dateweekyear
11962-01-06011962
21962-01-13021962
31962-01-13UnknownUnknown
41962-01-20031962
where_keyregionstatecity
11CTBridgepor
21CTBridgeport
39WATacoma
when_keywhere_keypneumonia_and_influenza_deathsother_deathsnrow_agg
123431
134461
222411
332431
412381
430391

The tables show the erroneous and missing data. We are going to perform some data cleaning operations to correct them.

where <- st |>
  get_dimension("where")

when <- st |>
  get_dimension("when")

updates <- record_update_set() |>
  update_selection(
    dimension = where,
    columns = c("city"),
    old_values = c("Bridgepor"),
    new_values = c("Bridgeport")
  ) |>
  match_records(dimension = when,
                old = 3,
                new = 2)

st <- st |>
  modify_dimension_records(updates)

The new dimension and fact tables are shown below.

when_keyweek_ending_dateweekyear
11962-01-06011962
21962-01-13021962
31962-01-20031962
where_keyregionstatecity
11CTBridgeport
29WATacoma
where_keywhen_keypneumonia_and_influenza_deathsother_deathsnrow_agg
113431
122411
132381
214461
222431
230391

In addition to the operations in the examples shown, starschemar offers support for defining role playing and role dimensions in a star schema, fact constellations with conformed dimensions, dimension enrichment, incremental data refresh operations, the ability to export results in various tibble-based formats, and query operations, adapted to this context.

Copy Link

Version

Install

install.packages('starschemar')

Monthly Downloads

491

Version

1.2.2

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Jose Samos

Last Published

September 14th, 2023

Functions in starschemar (1.2.2)

delete_unused_foreign_keys

Delete unused foreign keys
dimensional_query

dimensional_query S3 class
dimensional_model

dimensional_model S3 class
delete_records

Delete records
dm_mrs_cause

Star Definition for Mortality Reporting System by Cause
define_selected_facts

Define selected facts
dm_mrs_age

Star Definition for Mortality Reporting System by Age
dereference_dimension

Dereference a dimension
define_fact

Define facts in a dimensional_model object
define_selected_dimensions

Define selected dimensions
enrich_dimension_import_test

Import tibble to test to enrich a dimension
filter_dimension

Filter dimension
filter_fact_rows

Filter fact rows
ft_london_boroughs

London Boroughs
filter_selected_instances

Filter selected instances
enrich_dimension_import

Import tibble to enrich a dimension
enrich_dimension_export

Export selected attributes of a dimension
ft_datagov_uk

Modelling the long-term health impacts of air pollution in London
find_values

Find values in a dimension
get_dimension_name

Get the dimension name
get_dimension

Get dimension
get_conformed_dimension_names

Get conformed dimension names
ft_usa_city_county

USA City and County
ft_usa_states

USA States
get_dimension_names

Get dimension names
get_all_dimensions

Get all dimensions
get_name_of_role_playing_dimensions

Get the name of the role playing dimensions
get_measure_names

Get measure names
get_fact_name

Get fact name
get_conformed_dimension_names_st

Get conformed dimension names
get_dimension_type

Get the dimension type
get_role_playing_dimension_name

Get the associated role-playing dimension name
get_star_schema

Get star schema
incremental_refresh_star_schema

Incrementally refresh a star schema with another
is_conformed_dimension

Is it conformed dimension?
incremental_refresh_constellation

Incrementally refresh a constellation with a star schema
get_dimension_attribute_names

Get dimension attribute names
get_star_schema_names

Get star schema names
get_conformed_dimension

Get conformed dimension
incremental_refresh_fact

Incrementally refresh a fact table with another
get_name_of_uniquely_implemented_dimensions

Get name of uniquely implemented dimensions
homogenize

Homogenize a dimension
get_attribute_names

Get attribute names
group_facts

Group facts
get_role_dimension_names

Get role dimension names associated to a role-playing dimension
incremental_refresh_dimension

Incrementally refresh a dimension with another
is_role_playing_dimension

Is it role-playing dimension?
group_records

Group records
group_table

Group the records in the table
match_records

Make a dimension record equal to another
mrs_cause_w_test

Mortality Reporting System by Cause for Week Test
ms_mrs

Multistar for Mortality Reporting System
mrs_age_test

Mortality Reporting System by Age Test
mrs_age_w10

Mortality Reporting System by Age for Week 10
modify_conformed_dimension_records

Apply dimension record update operations to conformed dimensions
is_dimension_in_updates

Is dimension in set of updates?
is_role_dimension

Is it role dimension?
mrs_age

Mortality Reporting System by Age
new_constellation

constellation S3 class
mrs

Mortality Reporting System
new_dimension_table

dimension_table S3 class
mrs_age_w11

Mortality Reporting System by Age for Week 11
mrs_age_w_test

Mortality Reporting System by Age for Week Test
new_fact_table

fact_table S3 class
mrs_cause

Mortality Reporting System by Cause
purge_dimensions_constellation

Purge dimensions in a constellation
new_star_schema

star_schema S3 class
prepare_join

Transform a tibble to join
mrs_cause_test

Mortality Reporting System by Cause Test
new_record_update

record_update S3 class
rename_measures

Rename measures
purge_dimensions_star_schema

Purge dimensions
rename_dimension_attributes

Rename dimension attributes
replace_dimension

Replace a star schema dimension
modify_dimension_records

Apply dimension record update operations
multistar_as_flat_table

Export a multistar as a flat table
ms_mrs_test

Multistar for Mortality Reporting System Test
mrs_cause_w10

Mortality Reporting System by Cause for Week 10
mrs_cause_w11

Mortality Reporting System by Cause for Week 11
new_dimensional_model

dimensional_model S3 class
new_record_update_set

record_update_set S3 class
role_playing_dimension

Define a role playing dimension in a star_schema object
new_multistar

multistar S3 class
new_dimensional_query

dimensional_query S3 class
rename_fact

Rename fact
role_dimension

Transform a dimension into a role dimension
run_query

Run query
select_dimension

Select dimension
snake_case

Transform names according to the snake case style
snake_case_dimension

Transform names according to the snake case style in a dimension
st_mrs_cause

Star Schema for Mortality Reporting System by Cause
remove_duplicate_dimension_rows

Remove duplicate dimension rows
record_update_set

record_update_set S3 class
st_mrs_cause_test

Star Schema for Mortality Reporting System by Cause Test
star_schema_as_flat_table

Export a star schema as a flat table
st_mrs_age_test

Star Schema for Mortality Reporting System by Age Test
select_fact

Select fact
rename_dimension

Rename dimension
update_dimensions

Apply update operations to dimensions
st_mrs_age_w10

Star Schema for Mortality Reporting System by Age for Week 10
selection_bit_map

Generate a record selection bitmap
star_schema_as_mst

Star schema as multistar export (common)
set_fact_name

Set fact name
set_role_playing_dimension_name

Set the associated role-playing dimension name
replace_dimension_in_facts

Replace in facts a star schema dimension
reference_dimension

Reference a dimension
set_dimension_type_conformed

Set the type of a conformed dimension
st_mrs_cause_w_test

Star Schema for Mortality Reporting System by Cause for Week Test
replace_general_dimension_in_facts

Replace in facts a star schema general dimension
unify_facts_by_grain

Unify facts by grain
replace_records

Replace records
typed_value

Transform a value according to its type
update_facts_with_dimensions

Update facts with a list of modified dimensions
set_dimension_name

Set the dimension name
set_dimension_type

Set the dimension type
star_schema

star_schema S3 class
set_dimension_type_role_playing

Set the type of a role-playing dimension
st_mrs_cause_w10

Star Schema for Mortality Reporting System by Cause for Week 10
st_mrs_cause_w11

Star Schema for Mortality Reporting System by Cause for Week 11
star_schema_as_tibble_list

Export a star schema as a tibble list
update_record

Update a dimension record with a set of values
star_schema_as_multistar

Export a star schema as a multistar
update_selection

Update dimension records with a set of values
update_dimension

Apply dimension record update operations to a dimension
snake_case_fact

Transform names according to the snake case style in a fact table
st_mrs_age_w11

Star Schema for Mortality Reporting System by Age for Week 11
star_schema_as_tl

Export a star schema as a tibble list (common)
update_selection_general

Update dimension records with a set of values in given columns
starschemar

Obtaining Star Schemas from Flat Tables
updates_st_mrs_age_test

Updates for the Star Schema for Mortality Reporting System by Age Test
updates_st_mrs_age

Updates for the Star Schema for Mortality Reporting System by Age
st_mrs_age

Star Schema for Mortality Reporting System by Age
union_of_dimensions

Perform union of dimensions
replace_role_dimension_in_facts

Replace in facts a star schema role dimension
st_mrs_age_w_test

Star Schema for Mortality Reporting System by Age for Week Test
update_facts_with_general_dimension

Update facts with a general dimension
update_facts_with_role_dimension

Update facts with a role dimension
ct_mrs_test

Constellation for Mortality Reporting System Test
constellation_as_tibble_list

Export a constellation as a tibble list
character_dimensions

Transform dimension numeric attributes to character
define_dimension

Define dimensions in a dimensional_model object
constellation_as_multistar

Export a constellation as a multistar
constellation

constellation S3 class
conform_dimensions

Conform dimensions of given name
conform_all_dimensions

Conform all dimensions of a constellation
ct_mrs

Constellation for Mortality Reporting System
character_dimension

Transform a dimension numeric attributes to character