cdm_flatten_to_tbl: Flatten part of a `dm` into a wide table

Description

Gather all information of interest in one place in a wide table (on a database-dm a temporary table will be created). If referential integrity is given among the tables of the data model, the resulting table of this function will contain as many rows as the table start does (exceptions are join = anti_join (result is empty table with same columns as start) and join = right_join (number of rows equal to or larger than those of start)). For more information please refer to vignette("dm-joining").

Usage

cdm_flatten_to_tbl(dm, start, ..., join = left_join)
cdm_squash_to_tbl(dm, start, ..., join = left_join)

Arguments

A dm object

start

Table to start from. From this table all outgoing foreign key relations are considered to establish a processing order for the joins. An interesting choice could be for example a fact table in a star schema.

...

Unquoted table names to include in addition to start. The order of the tables here determines the order of the joins. If empty, all tables that can be reached are included. If this includes tables which aren't direct neighbours of start, it will only work with cdm_squash_to_tbl() (given one of the allowed join-methods).

join

The type of join to be performed, see dplyr::join()

Value

A single table, resulting of consecutively joining all tables involved to table start.

Details

With the ... left empty, this function joins all the tables of your dm object together, that can be reached from table start in the direction of the foreign key relations (pointing from child table to parent table), using the foreign key relations to determine the parameter by for the necessary joins. The result is one table with unique column names. Use the ... if you want to control which tables should be joined to table start.

How does filtering affect the result?

Case 1, either no filter conditions are set in the dm, or only in a part unconnected to table start: The necessary disambiguations of the column names are performed first. Then all involved foreign tables are joined to table start successively with the join function given in parameter join.

Case 2, filter conditions are set for at least one table connected to start: Disambiguation is performed initially if necessary. Table start is calculated using tbl(dm, "start"). This implies that the effect of the filters on this table is taken into account. For right_join, full_join and nest_join an error is thrown in case filters are set, because the filters won't affect right hand side tables and thus the result will be incorrect in general (and calculating the effects on all RHS-tables would be time-consuming and is not supported; if desired call cdm_apply_filters() first to achieve this effect.). For all other join types filtering only start is enough, since the effect is passed on by the successive joins.

Mind, that calling cdm_flatten_to_tbl() with join = right_join and no table order determined in the ... would not lead to a well-defined result, if two or more foreign tables are to be joined to start. The resulting table would depend on the order the tables are listed in the dm. Therefore trying this results in a warning.

Since join = nest_join() does not make sense in this direction (LHS = child table, RHS = parent table: for valid key constraints each nested column entry would be a tibble of 1 row), an error is thrown, if this method is chosen.

Examples

Run this code

# NOT RUN {
cdm_nycflights13() %>%
  cdm_select_tbl(-weather) %>%
  cdm_flatten_to_tbl(flights)
# }