check_cardinality: Test if the relation between two tables of a data model meet the requirements

Description

All check_cardinality() functions test the following conditions:

Is pk_column is a unique key for parent_table?
Is the set of values in fk_column of child_table a subset of the set of values of pk_column?
Does the relation between the two tables of the data model meet the cardinality requirements?

Usage

check_cardinality_0_n(parent_table, pk_column, child_table, fk_column)
check_cardinality_1_n(parent_table, pk_column, child_table, fk_column)
check_cardinality_1_1(parent_table, pk_column, child_table, fk_column)
check_cardinality_0_1(parent_table, pk_column, child_table, fk_column)
check_cardinality(parent_table, pk_column, child_table, fk_column)

Arguments

parent_table

Data frame.

pk_column

Column of parent_table that has to be one of its unique keys.

child_table

Data frame.

fk_column

Column of child_table that has to be a foreign key to pk_column in parent_table.

Value

Functions return parent_table, invisibly, if the check is passed, to support pipes. Otherwise an error is thrown and the reason for it is explained.

Details

All check_cardinality functions accept a parent table (data frame), a column name of this table, a child table, and a column name of the child table. The given column of the parent table has to be one of its unique keys (no duplicates are allowed). Furthermore, in all cases, the set of values of the child table's column has to be a subset of the set of values of the parent table's column.

The cardinality specifications 0_n, 1_n, 0_1, 1_1 refer to the expected relation that the child table has with the parent table. The numbers 0, 1 and n refer to the number of values in the column of the child table that correspond to each value of the column of the parent table. n means "more than one" in this context, with no upper limit.

0_n means, that for each value of the pk_column, at least 0 and at most n values have to correspond to it in the column of the child table (which translates to no further restrictions).

1_n means, that for each value of the pk_column, at least 1 and at most n values have to correspond to it in the column of the child table. This means that there is a "surjective" mapping from the child table to the parent table w.r.t. the specified columns, i.e. for each parent table column value there exists at least one equal child table column value.

0_1 means, that for each value of the pk_column, at least 0 and at most 1 value has to correspond to it in the column of the child table. This means that there is a "injective" mapping from the child table to the parent table w.r.t. the specified columns, i.e. no parent table column value is addressed multiple times. But not all of the parent table column values have to be referred to.

1_1 means, that for each value of the pk_column, exactly 1 value has to correspond to it in the column of the child table. This means that there is a "bijective" ("injective" AND "surjective") mapping between the child table and the parent table w.r.t. the specified columns, i.e. the sets of values of the two columns are equal and there are no duplicates in either of them.

Finally, check_cardinality() tests for and returns the nature of the relationship (injective, surjective, bijective, or none of these) between the two given columns.

Examples

Run this code

# NOT RUN {
d1 <- tibble::tibble(a = 1:5)
d2 <- tibble::tibble(c = c(1:5, 5))
d3 <- tibble::tibble(c = 1:4)
# This does not pass, `c` is not unique key of d2:
try(check_cardinality_0_n(d2, c, d1, a))

# This passes, multiple values in d2$c are allowed:
check_cardinality_0_n(d1, a, d2, c)

# This does not pass, injectivity is violated:
try(check_cardinality_1_1(d1, a, d2, c))

# This passes:
check_cardinality_0_1(d1, a, d3, c)
# }

Run the code above in your browser using DataLab