exclude: Remove rows based on conditions or another data set

Description

This function combines dplyr::anti_join(), and negation of dplyr::filter(). When a second data set is supplied through the excl argument, anti join would be performed; otherwise, data would be filtered with the expression given via the condition argument, and the filter result would in turn be removed using dplyr::setdiff().

Usage

exclude(
  data,
  excl = NULL,
  by = NULL,
  condition = NULL,
  verbose = getOption("healthdb.verbose"),
  report_on = NULL,
  ...
)

Value

A data frame or remote table that is a subset of data.

Arguments

data: Data.frames or remote tables (e.g., from dbplyr::tbl_sql()). A subset will be removed from this data.
excl: Data frames or remote tables (e.g., from 'dbplyr'). Rows/values present in it will be removed from data if there is a match. This will be passed to dplyr::anti_join() as the second argument.
by: Column names that should be matched by dplyr::anti_join(), or a expressions with dplyr::join_by(). See dplyr::anti_join()'s by argument for detail. Default NULL is the same as setdiff(data, excl).
condition: An expression that will be passed to dplyr::filter(). The rows that satisfy condition are those to be removed from data.
verbose: A logical for whether printing explanation for the operation. Default is fetching from options. Use options(healthdb.verbose = FALSE) to suppress once and for all.
report_on: A quoted/unquoted column name for counting how many of its distinct values were removed from data, e.g., counting how many client IDs were removed. Default is NULL.
...: Additional arguments passing to dplyr::filter()/dplyr::anti_join() for finer control of matching, e.g., na action, by-group filtering, etc.

Examples

Run this code

# exclude with condition
cyl_not_4 <- exclude(mtcars, condition = cyl == 4, report_on = cyl)

# exclude with another data
exclude(mtcars, cyl_not_4, dplyr::join_by(cyl), report_on = cyl)

Run the code above in your browser using DataLab