Learn R Programming

pointblank (version 0.3.0)

conjointly: Perform multiple rowwise validations for joint validity

Description

The conjointly() validation step function checks whether the same test units all pass multiple validations with col_vals_*()-type functions. Because of the imposed constraint on the allowed validation step functions, all test units are rows of the table (after any common preconditions have been applied). This validation step function (internally composed of multiple steps) ultimately performs a rowwise test of whether all sub-validations reported a pass for the same test units. In practice, an example of a joint validation is testing whether values for column a are greater than a specific value while values for column b lie within a specified range. The validation step functions to be part of the conjoint validation are to be supplied as one-sided R formulas (using a leading ~, and having a . stand in as the data object). This function can be used directly on a data table or with an agent object (technically, a ptblank_agent object).

Usage

conjointly(
  x,
  ...,
  .list = list2(...),
  preconditions = NULL,
  actions = NULL,
  brief = NULL
)

Arguments

x

A data frame, tibble, or an agent object of class ptblank_agent.

...

a collection one-sided formulas that consist of validation step functions that validate row units. Specifically, these functions should be those with the naming pattern col_vals_*(). An example of this is ~ col_vals_gte(., vars(a), 5.5), ~ col_vals_not_null(., vars(b)).

.list

Allows for the use of a list as an input alternative to ....

preconditions

expressions used for mutating the input table before proceeding with the validation. This is ideally as a one-sided R formula using a leading ~. In the formula representation, the tbl serves as the input data table to be transformed (e.g., ~ tbl %>% dplyr::mutate(col = col + 10). A series of expressions can be used by enclosing the set of statements with { } but note that the tbl object must be ultimately returned.

actions

A list containing threshold levels so that the validation step can react accordingly when exceeding the set levels. This is to be created with the action_levels() helper function.

brief

An optional, text-based description for the validation step.

Function ID

2-14

Details

If providing multiple column names in any of the supplied validation step functions, the result will be an expansion of sub-validation steps to that number of column names. Aside from column names in quotes and in vars(), tidyselect helper functions are available for specifying columns. They are: starts_with(), ends_with(), contains(), matches(), and everything().

Having table preconditions means pointblank will mutate the table just before interrogation. It's isolated to the validation steps produced by this validation step function. Using dplyr code is suggested here since the statements can be translated to SQL if necessary. The code is to be supplied as a one-sided R formula (using a leading ~). In the formula representation, the obligatory tbl variable will serve as the input data table to be transformed (e.g., ~ tbl %>% dplyr::mutate(col_a = col_b + 10). A series of expressions can be used by enclosing the set of statements with { } but note that the tbl variable must be ultimately returned.

Often, we will want to specify actions for the validation. This argument, present in every validation step function, takes a specially-crafted list object that is best produced by the action_levels() function. Read that function's documentation for the lowdown on how to create reactions to above-threshold failure levels in validation. The basic gist is that you'll want at least a single threshold level (specified as either the fraction test units failed, or, an absolute value), often using the warn_at argument. This is especially true when x is a table object because, otherwise, nothing happens. For the col_vals_*()-type functions, using action_levels(warn_at = 0.25) or action_levels(stop_at = 0.25) are good choices depending on the situation (the first produces a warning when a quarter of the total test units fails, the other stop()s at the same threshold level).

Want to describe this validation step in some detail? Keep in mind that this is only useful if x is an agent. If that's the case, brief the agent with some text that fits. Don't worry if you don't want to do it. The autobrief protocol is kicked in when brief = NULL and a simple brief will then be automatically generated.

See Also

Other Validation Step Functions: col_exists(), col_is_character(), col_is_date(), col_is_factor(), col_is_integer(), col_is_logical(), col_is_numeric(), col_is_posix(), col_vals_between(), col_vals_equal(), col_vals_gte(), col_vals_gt(), col_vals_in_set(), col_vals_lte(), col_vals_lt(), col_vals_not_between(), col_vals_not_equal(), col_vals_not_in_set(), col_vals_not_null(), col_vals_null(), col_vals_regex(), rows_distinct()

Examples

Run this code
# NOT RUN {
library(dplyr)

# Create a simple table with three
# columns of numerical values
tbl <-
  tibble(
    a = c(5, 7, 6, 5, 8, 7),
    b = c(3, 4, 6, 8, 9, 11),
    c = c(2, 6, 8, NA, 3, 8)
  )

# Validate that values in column
# `a` are always greater than 4
agent <-
  create_agent(tbl = tbl) %>%
  conjointly(
    ~ col_vals_gt(., vars(a), 6),
    ~ col_vals_lt(., vars(b), 10),
    ~ col_vals_not_null(., vars(c))
    ) %>%
  interrogate()

# }

Run the code above in your browser using DataLab