starschemar (version 1.2.0)

define_fact: Define facts in a dimensional_model object

Description

To define facts in a dimensional_model object, the essential data is a name and a set of measurements that can be empty (does not have explicit measurements). Associated with each measurement, an aggregation function is required, which by default is SUM.

Usage

define_fact(
  st,
  name = NULL,
  measures = NULL,
  agg_functions = NULL,
  nrow_agg = "nrow_agg"
)

# S3 method for dimensional_model define_fact( st, name = NULL, measures = NULL, agg_functions = NULL, nrow_agg = "nrow_agg" )

Arguments

st

A dimensional_model object.

name

A string, name of the fact.

measures

A vector of measure names.

agg_functions

A vector of aggregation function names. If none is indicated, the default is SUM. Additionally they can be MAX or MIN.

nrow_agg

A string, measurement name for the number of rows aggregated.

Value

A dimensional_model object.

Details

To get a star schema (a star_schema object) we need a flat table (implemented through a tibble) and a dimensional_model object. The definition of facts in the dimensional_model object is made from the flat table column names. Using the dput function we can list the column names of the flat table so that we do not have to type their names.

Associated with each measurement there is an aggregation function that can be SUM, MAX or MIN. Mean is not considered among the possible aggregation functions: The reason is that calculating the mean by considering subsets of data does not necessarily yield the mean of the total data.

An additional measurement corresponding to the number of aggregated rows is always added which, together with SUM, allows us to obtain the mean if needed.

See Also

Other star definition functions: define_dimension(), dimensional_model()

Examples

Run this code
# NOT RUN {
library(tidyr)

# dput(colnames(mrs_age))
#
# c(
#   "Reception Year",
#   "Reception Week",
#   "Reception Date",
#   "Data Availability Year",
#   "Data Availability Week",
#   "Data Availability Date",
#   "Year",
#   "WEEK",
#   "Week Ending Date",
#   "REGION",
#   "State",
#   "City",
#   "Age Range",
#   "Deaths"
# )

dm <- dimensional_model() %>%
  define_fact(
    name = "mrs_age",
    measures = c("Deaths"),
    agg_functions = c("SUM"),
    nrow_agg = "nrow_agg"
  )

dm <- dimensional_model() %>%
  define_fact(
    name = "mrs_age",
    measures = c("Deaths")
  )

dm <- dimensional_model() %>%
  define_fact(name = "Factless fact")

# }

Run the code above in your browser using DataCamp Workspace