Learn R Programming

rolap

The aim of the multidimensional data model is organize data for supporting data analysis. Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure.

Transformations can be carried out using professional ETL (Extract, Transform and Load) tools. Recently, tools aimed at end users have emerged, which are also aimed at performing transformation operations. All these tools are very useful to carry out the transformation process, they provide a development environment to define the transformation operations in a general way.

Frequently, the operations to be performed aim to transform a set of tables with data that comes from operational systems into a ROLAP (Relational On-Line Analytical Processing) star database, made up of fact and dimension tables, which implements a multidimensional system. With the tools mentioned above, this transformation can be carried out, but it requires a lot of work. We are not aware of any tools with operations designed to specifically support this transformation process.

The goal of rolap is to define transformations that allow us to easily obtain ROLAP star databases, composed by fact and dimension tables, from operational tables, to be able to export them in various formats to be used by OLAP query tools and also be able to exploit them from R.

The rolap package builds on experience with the starschemar package on which it is based. It incorporates the main functionalities for which starschemar was initially intended. In particular, the data model and the way of treating role-playing and role dimensions have been changed, so that it is easier to add future extensions. It has been designed in such a way that migration from starschemar is practically immediate.

Installation

You can install the released version of rolap from CRAN with:

install.packages("rolap")

And the development version from GitHub with:

devtools::install_github("josesamos/rolap")

Example

To illustrate how the package works we will use a small part of the Deaths in 122 U.S. cities - 1962-2016. 122 Cities Mortality Reporting System data set in the form of a flat table, available in the package in the ft_num variable, shown below.

YearWEEKWeek Ending DateREGIONStateCityPneumonia and Influenza DeathsAll Deaths<1 year (all cause deaths)1-24 years (all cause deaths)25-44 years45-64 years (all cause deaths)65+ years (all cause deaths)
1962201/13/19621MABoston112701481170167
1962401/27/19621MABoston12285227873175
1963401/26/19631MABoston1027611141767167
1964301/18/19641MABoston133251772490187
1964602/08/19641MABoston92441391461147
1962301/20/19621CTBridgeport2405131021
1962502/03/19621CTBridgeport5466031522
1962802/24/19621CTBridgeport2452021625
1963401/26/19631CTBridgeport2464031029
1964502/01/19641CTBridgeport8453121128
1962903/03/19621MACambridge439102729
1964201/11/19641MACambridge731102919
1964502/01/19641MACambridge627200817
1964902/29/19641MACambridge026002816
1962401/27/19621CTHartford1477101425
1962702/17/19621CTHartford4573132129
1963301/19/19631CTHartford2667231836
1963702/16/19631CTHartford4776171944
1963802/23/19631CTHartford6493231427
1964201/11/19641CTHartford3537021628

Star database definition

The transformation to obtain a star database from the table using rolap package is as follows:

library(rolap)

where <- dimension_schema(name = "Where",
                          attributes = c("REGION",
                                         "State",
                                         "City"))

s <- star_schema() |>
  define_facts(name = "MRS Cause",
               measures = c("Pneumonia and Influenza Deaths",
                            "All Deaths")) |>
  define_dimension(name = "When",
                   attributes = c("Year")) |>
  define_dimension(where)

db <- star_database(s, ft_num) |>
  snake_case()

The dimension and fact schemas can be defined as variables (where) to be reused or directly in the star schema definition. To make it easier to work in a database environment we transform the table field names to snake case.

Geographic attributes can be associated with vector layers of geographic information.

db <- db |>
  define_geoattribute(
    dimension = "where",
    attribute = "state",
    from_layer = us_layer_state,
    by = "STUSPS"
  )

Result

To better appreciate the result, let’s export it as a tibble list. The tables of dimensions and facts of the obtained star database are shown below.

ls <- db |>
  as_tibble_list()

for (i in 1:length(ls)) {
  pander::pandoc.table(ls[[i]], split.table = Inf)
}
when_keyyear
11962
21963
31964
where_keyregionstatecity
11CTBridgeport
21CTHartford
31MABoston
41MACambridge
when_keywhere_keypneumonia_and_influenza_deathsall_deathsnrow_agg
1191313
1251042
13235552
144391
212461
22121923
23102761
318451
323531
33225692
3413843

The result can be exported in various formats such as csv and xslx files or to a relational database, as shown below.

con <- DBI::dbConnect(RSQLite::SQLite())

db |>
  as_rdb(con)

DBI::dbListTables(con)
#> [1] "mrs_cause" "when"      "where"

DBI::dbDisconnect(con)

In addition to exporting it, we can formulate multidimensional queries from R. Below is an example.

sq <- star_query(db) |>
  select_dimension(name = "where",
                   attributes = "state") |>
  select_dimension(name = "when",
                   attributes = "year") |>
  select_fact(name = "mrs_cause",
              measures = "all_deaths") |>
  filter_dimension(name = "when", year >= "1963") |>
  filter_dimension(name = "where", city == "Bridgeport" | city == "Boston")

db_2 <- db |>
  run_query(sq)

The result can be displayed using the pivottabler package.

ft <- db_2 |>
  as_single_tibble_list()
ft_cause <- ft[["mrs_cause"]]

pt <- pivottabler::qpvt(
  ft_cause,
  c("=", "state"),
  c("year"),
  c("Number of Deaths" = "sum(all_deaths)")
)

pt$renderPivot()

We can obtain a geographic information layer that includes it, to use it in R as an object of class sf.

gl <- db_2 |>
  as_geolayer()

l1 <- gl |>
  get_layer()
class(l1)
#> [1] "sf"         "tbl_df"     "tbl"        "data.frame"

title <- gl |>
  get_variable_description("var_1")

plot(sf::st_geometry(l1[, c("var_1")]), axes = TRUE, main = title)
text(
  sf::st_coordinates(sf::st_centroid(sf::st_geometry(l1))),
  labels = paste0(l1$state, ": ", l1$var_1),
  pos = 3,
  cex = 1.5
)

We can also include all geographic instances originally present in the layer.

l2 <- gl |>
  get_layer(keep_all_variables_na = TRUE)

plot(sf::st_shift_longitude(l2[, "var_1"]), axes = TRUE, main = title)

Or export it in GeoPackage format.

f <- gl |>
  as_GeoPackage(dir = tempdir())

sf::st_layers(f)
#> Driver: GPKG 
#> Available layers:
#>   layer_name geometry_type features fields crs_name
#> 1   geolayer       Polygon        2      3   WGS 84
#> 2  variables            NA        2      3     <NA>

We can work with several star databases to form a constellation. It supports the definition of role-playing and role dimensions, as well as incremental refresh operations and automatic deployment on any RDBMS (Relational Database Management System). Examples and detailed information of these functionalities can be found in the documentation and vignettes of the package.

Copy Link

Version

Install

install.packages('rolap')

Monthly Downloads

575

Version

2.5.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Jose Samos

Last Published

January 10th, 2024

Functions in rolap (2.5.1)

as_multistar

Generate a geomultistar::multistar object
define_dimension

Define dimension in a star_schema object.
define_facts

Define facts in a star_schema object.
delete_all_operations_found

Delete in stars all operations found
filter_dimension

Filter dimension
conform_dimensions

Conform dimensions
fact_schema

fact_schema S3 class
constellation

Create constellation
draw_tables

Draw tables
fact_table

fact_table S3 class
delete_operation

Delete an operation
coordinates_to_point

Transform coordinates to point geometry
dimension_schema

dimension_schema S3 class
dimension_table

dimension_table S3 class
default_disconnect

Default disconnect function
as_star_database

Get a star database from a flat table
as_single_tibble_list

Generate a list of tibbles of flat tables
define_geoattribute

Define geoattribute of a dimension
generate_table_sql_delete

Generate table sql delete
generate_refresh_sql

Generate refresh sql
define_geoattribute_from_layer

Define geoattribute from a layer
generate_table_sql_insert

Generate table sql insert
ft_num

Mortality Reporting System with numerical measures
ft_age_rpd

Mortality Reporting System by Age
as_rdb

Generate tables in a relational database
ft

Mortality Reporting System
ft_age

Mortality Reporting System by Age Group
get_agg_functions.fact_schema

Get aggregate functions
get_next_operation

A star_operation object row is returned, the one following the actual given
generate_table_sql_update

Generate table sql update
get_nrow_agg.fact_schema

Get number of rows aggregate column
check_lookup_table

Check the result of joining a flat table with a lookup table
delete_operation_set

Delete a set of operations
check_refesh

Checks the refresh of the selected star database from the given database
get_new_dimension_instances

Get new dimension instances
get_table

Get the table of the flat table
get_layer

Get geographic information layer
get_measure_names_schema.star_schema

Get measure names
get_layer_from_attribute

Get layer from attribute
ft_cause_rpd

Mortality Reporting System by Cause
get_surrogate_key.dimension_table

Get surrogate key names
filter_geo_attributes

From attributes, leave only these contained in dimensions
deploy

Deploy a star database in a relational database
filter_geo_dimensions

From geodimensions, leave only contained in vector of names
get_table_names

Get the names of the tables of a star database
get_transformation_code

Get transformation function code
interpret_operation_join_lookup_table

Interpret operation
interpret_operation_group_dimension_instances

Interpret operation
interpret_operation_select_measures

Interpret operation
get_unknown_values

Get unknown attribute values
get_fact_names

Get the names of the facts of a star database
interpret_operation_transform_to_values

Interpret operation
is_scd

Is a scd dimension
interpret_operation_separate_measures

Interpret operation
purge_dimension

Purge instances of a dimension
purge_dimension_instances

Purge instances of dimensions
is_new_operation

A star_operation is new?
is_empty_string

check if a string is empty
group_dimension_instances

Group instances of a dimension
get_variable_description

Get variable description
select_attributes

Select attributes of a flat table
group_facts

Group facts
get_all_dimension_operations

Gets the operations performed on a dimension in all star_database objects
get_geoattribute_geometries

Get geoattribute geometries
get_similar_attribute_values_individually.flat_table

Get similar values for individual attributes
get_unique_attribute_values.flat_table

Get unique attribute values
get_similar_attribute_values.flat_table

Get similar attribute values combination
get_transformation_file

Get transformation function file
interpret_operation_replace_attribute_values

Interpret operation
mrs_age_schema

Star schema for Mortality Reporting System by Age
replace_attribute_values.flat_table

Replace instance values
select_instances_by_comparison

Select instances of a flat table by comparison
replace_empty_values

Replace empty values with the unknown value
mrs_age_schema_rpd

Star schema for Mortality Reporting System by Age with additional dates
interpret_operation_transform_to_attribute

Interpret operation
interpret_operation_replace_empty_values

Interpret operation
mrs_cause_schema

Star schema for Mortality Reporting System by Cause
interpret_operation_transform_to_measure

Interpret operation
get_attribute_names.flat_table

Get the names of the attributes
mrs_ft_new

Flat table generated from MRS file
select_dimension

Select dimension
mrs_cause_schema_rpd

Star schema for Mortality Reporting System by Cause with additional dates
mrs_ft

Flat table generated from MRS file
refresh_deployments

Refresh deployments
get_attribute_names_schema.dimension_schema

Get attribute names
get_layer_geometry

Get layer geometry
get_existing_fact_instances

Get existing fact instances
get_lookup_tables

Get lookup tables
snake_case.flat_table

Transform names according to the snake case style
transform_to_measure

Transform to measure
select_fact

Select fact
get_attribute_names_schema.star_schema

Get attribute names
get_fact_name.fact_schema

Get fact name
select_instances

Select instances of a flat table by value
transform_to_attribute

Transform to attribute
star_database

star_database S3 class
snake_case_table.fact_table

Transform names according to the snake case style
snake_case_table.dimension_table

Transform names according to the snake case style
summarize_layer

Summarize geometry of a layer
select_measures

Select measures of a flat table
transform_attribute_format

Transform attribute format
remove_all_measures_na

Remove instance if all measures are na
validate_lookup_parameters

Validate lookup parameters
validate_measures

Validate measure names
get_geoattribute_name

Get geoattribute name
flat_table

flat_table S3 class
get_deployment_names

Get the names of the facts of a star database
get_dimension_names

Get the names of the dimensions of a star database
get_default_unknown_value

get default unknown value
filter_rpd_dimensions

From rpd dimensions, leave only contained in vector of names.
get_dimension_table

Get dimension table
get_measure_names.flat_table

Get the names of the measures
vector_to_string

Transforms a vector of strings into a string.
get_similar_values_table

Get similar values in a table
get_geoattributes

Get geoattributes
get_pk_attribute_names

Get the names of the primary key attributes of a flat table
get_point_geometry

Get point geometry
get_star_database

Get star database
get_unknown_value_defined

Get the unknown value defined
replace_unknown_values

Replace unknown values with the given value
replace_string

Replace strings
interpret_operation_remove_instances_without_measures

Interpret operation
interpret_operation_lookup_table

Interpret operation
get_unique_values_table

Get unique values in a table
interpret_operation_role_playing_dimension

Interpret operation
get_star_query_schema

Get star query schema
get_star_schema

Get star schema
get_measure_names_schema.fact_schema

Get measure names
get_variables

Get the variables layer
interpret_operation_snake_case

Interpret operation
interpret_operation_select_attributes

Interpret operation
interpret_operation_star_database

Interpret operation
new_multistar

multistar S3 class
prepare_to_join

Prepare the instances table implemented by a tibble to join
group_by_keys

Group table instances by keys aggregating the measures using the corresponding aggregation function.
interpret_operation_replace_string

Interpret operation
incremental_refresh

Refresh a star database in a constellation
get_role_playing_dimension_names

Get the names of the role playing dimensions
interpret_operation_add_custom_column

Interpret operation
interpret_operation_replace_unknown_values

Interpret operation
interpret_operation_select_instances_by_comparison

Interpret operation
integrate_geo_dimensions

Integrate two geodimensions
interpret_operation_transform_attribute_format

Interpret operation
interpret_operation_set_attribute_names

Interpret operation
interpret_operation_select_instances

Interpret operation
set_variables

Set variables layer
join_lookup_table

Join a flat table with a lookup table
interpret_operation_set_measure_names

Interpret operation
get_rpd_dimensions

Get rpd dimensions of a dimension
interpret_operation_flat_table

Interpret operation
interpret_operation_transform_from_values

Interpret operation
replace_empty_values_table

Replace empty values with the unknown value
read_flat_table_folder

Import all flat table files in a folder
reformat_file

Get line last operation
share_dimension_instance_operations

Share dimension instance operations between all star_database objects
lookup_table

Transform a flat table into a look up table
load_star_database

Load star_database (from a RDS file)
transform_from_values

Transform attribute values into measure names
transform_names

For each row, add a vector of values
unify_rpd

Unify lists of dimension names if there are any in common
star_query

star_query S3 class
star_schema

star_schema S3 class
string_or_null

Get the representation to output
purge_dimension_instances_star_database

Purge instances of dimensions
string_to_vector

Transforms string into a vector of strings.
replace_names

Replace names
mrs_db

Constellation generated from MRS file
validate_attributes

Validate attribute names
line_last_op

Get line last operation
mrs_db_geo

Constellation generated from MRS file through a query and with geographic information
validate_dimension_attributes

Validate dimension attributes
read_flat_table_file

Import flat table file
set_layer

Set geographic layer
run_query

Run query
update_according_to

Update a flat table according to another structure
same_granularity_facts

Do all fact tables have the same granularity?
validate_names

Validate names
set_measure_names.flat_table

Rename measures
vector_presentation

vector to string for presentation
validate_facts

Validate fact names
transform_to_values

Transform measure names into attribute values
star_database_with_previous_operations

Creates a star_database adding previous operations
multiple_value_key

Multiple value key
star_operation

star_operation S3 class
validate_dimension_names

Validate dimension names
unify_facts_and_dimensions

Unify facts and dimensions in a flat table
remove_instances_without_measures

Remove instances without measures
remove_duplicate_dimension_rows

Remove duplicate dimension rows
role_playing_dimension

Define a role playing dimension and its associated dimensions
separate_measures

Separate measures in flat tables
rpd_in_constellation

Transform role playing dimensions in constellation
name_with_nexus

Name with nexus
set_attribute_names.flat_table

Rename attributes
share_dimensions

Share the given dimensions in the database
simplify_rpd_dimensions

From a vector of dimensions, leave only one of each rpd.
us_layer_state

Geographic layer of US States
us_census_state

Census of US States, by sex and age
add_operation

A star_operation object row is added with a new operation
as_GeoPackage

Save as GeoPackage
add_custom_column

Add custom column
as_csv_files

Generate csv files with fact and dimension tables
add_surrogate_key.dimension_table

Add the surrogate key from a dimension table to the instances table.
add_dput_column

For each row, add a vector of values
apply_select_fact

Apply select fact
apply_select_dimension

Apply select dimension
add_dimension_instances

Add dimension instances
as_xlsx_file

Generate a xlsx file with fact and dimension tables
as_tibble_list

Generate a list of tibbles with fact and dimension tables
cancel_deployment

Cancel deployment
apply_filter_dimension

Apply filter dimension
as_dm_class

Generate a dm class with fact and dimension tables
as_geolayer

Get a geolayer object
check_geoattribute_geometry

Check a geoattribute geometry instances.