dm (version 0.0.3.9003)

cdm_filter: Filtering a dm object

Description

Filtering one table of a dm object may affect all tables connected to this table via one or more steps of foreign key relations. Firstly, one or more filter conditions for one or more tables can be defined using cdm_filter(), with a syntax similar to dplyr::filter(). These conditions will be stored in the dm and not immediately executed. With cdm_apply_filters() all tables will be updated according to the filter conditions and the foreign key relations.

Usage

cdm_filter(dm, table, ...)

cdm_apply_filters(dm)

Arguments

dm

A dm object.

table

A table in the dm

...

Logical predicates defined in terms of the variables in .data, passed on to dplyr::filter(). Multiple conditions are combined with & or ,. Only rows where the condition evaluates to TRUE are kept.

The arguments in ... are automatically quoted and evaluated in the context of the data frame. They support unquoting and splicing. See vignette("programming", package = "dplyr") for an introduction to these concepts.

Details

cdm_filter() allows you to set one or more filter conditions for one table of a dm object. These conditions will be stored in the dm for when they are needed. The conditions are only evaluated in one of the following scenarios:

  1. Calling cdm_apply_filters() or compute() (method for dm objects) on a dm: each filtered table potentially reduces the rows of all other tables connected to it by foreign key relations (cascading effect), only leaving the rows with the corresponding key values. Tables that are not connected to any table with an active filter are left unchanged. This results in a new dm class object.

  2. Calling one of tbl(), [[.dm(), $.dm(): the remaining rows of the requested table are calculated based on the filter conditions and the foreign key conditions (similar to 1. but only for one table)

Several functions of the dm package will throw an error if unevaluated filter conditions exist when they are called.

Examples

Run this code
# NOT RUN {
library(dplyr)

dm_nyc_filtered <-
  cdm_nycflights13() %>%
  cdm_filter(airports, name == "John F Kennedy Intl")

tbl(dm_nyc_filtered, "flights")
dm_nyc_filtered[["planes"]]
dm_nyc_filtered$airlines

cdm_nycflights13() %>%
  cdm_filter(airports, name == "John F Kennedy Intl") %>%
  cdm_apply_filters()

# If you want to only keep those rows in the parent tables
# whose primary key values appear as foreign key values in
# `flights`, you can set a `TRUE` filter in `flights`:
cdm_nycflights13() %>%
  cdm_filter(flights, 1 == 1) %>%
  cdm_apply_filters() %>%
  cdm_nrow()
# note, that in this example the only affected table is
# `airports` (since the departure airports in `flights` are
# only the 3 NYC ones).

cdm_nycflights13() %>%
  cdm_filter(flights, month == 3) %>%
  cdm_apply_filters()

library(dplyr)
cdm_nycflights13() %>%
  cdm_filter(planes, engine %in% c("Reciprocating", "4 Cycle")) %>%
  compute()
# }

Run the code above in your browser using DataLab