Learn R Programming

pointblank (version 0.3.0)

col_vals_not_equal: Are numerical column data not equal to a specific value?

Description

The col_vals_not_equal() validation step function checks whether column values (in any number of specified columns) are not equal to a specified value. This function can be used directly on a data table or with an agent object (technically, a ptblank_agent object). Each validation step will operate over the number of test units that is equal to the number of rows in the table (after any preconditions have been applied).

Usage

col_vals_not_equal(
  x,
  columns,
  value,
  na_pass = FALSE,
  preconditions = NULL,
  actions = NULL,
  brief = NULL
)

Arguments

x

A data frame, tibble, or an agent object of class ptblank_agent.

columns

The column (or a set of columns, provided as a character vector) to which this validation should be applied.

value

a numeric value used to test for non-equality.

na_pass

Should any encountered NA values be allowed to pass a validation unit? This is by default FALSE. Set to TRUE to give NAs a pass.

preconditions

expressions used for mutating the input table before proceeding with the validation. This is ideally as a one-sided R formula using a leading ~. In the formula representation, the tbl serves as the input data table to be transformed (e.g., ~ tbl %>% dplyr::mutate(col = col + 10). A series of expressions can be used by enclosing the set of statements with { } but note that the tbl object must be ultimately returned.

actions

A list containing threshold levels so that the validation step can react accordingly when exceeding the set levels. This is to be created with the action_levels() helper function.

brief

An optional, text-based description for the validation step.

Value

Either a ptblank_agent object or a table object, depending on what was passed to x.

Function ID

2-4

Details

If providing multiple column names, the result will be an expansion of validation steps to that number of column names (e.g., vars(col_a, col_b) will result in the entry of two validation steps). Aside from column names in quotes and in vars(), tidyselect helper functions are available for specifying columns. They are: starts_with(), ends_with(), contains(), matches(), and everything().

This validation step function supports special handling of NA values. The na_pass argument will determine whether an NA value appearing in a test unit should be counted as a pass or a fail. The default of na_pass = FALSE means that any NAs encountered will accumulate failing test units.

Having table preconditions means pointblank will mutate the table just before interrogation. It's isolated to the validation steps produced by this validation step function. Using dplyr code is suggested here since the statements can be translated to SQL if necessary. The code is to be supplied as a one-sided R formula (using a leading ~). In the formula representation, the obligatory tbl variable will serve as the input data table to be transformed (e.g., ~ tbl %>% dplyr::mutate(col_a = col_b + 10). A series of expressions can be used by enclosing the set of statements with { } but note that the tbl variable must be ultimately returned.

Often, we will want to specify actions for the validation. This argument, present in every validation step function, takes a specially-crafted list object that is best produced by the action_levels() function. Read that function's documentation for the lowdown on how to create reactions to above-threshold failure levels in validation. The basic gist is that you'll want at least a single threshold level (specified as either the fraction test units failed, or, an absolute value), often using the warn_at argument. This is especially true when x is a table object because, otherwise, nothing happens. For the col_vals_*()-type functions, using action_levels(warn_at = 0.25) or action_levels(stop_at = 0.25) are good choices depending on the situation (the first produces a warning when a quarter of the total test units fails, the other stop()s at the same threshold level).

Want to describe this validation step in some detail? Keep in mind that this is only useful if x is an agent. If that's the case, brief the agent with some text that fits. Don't worry if you don't want to do it. The autobrief protocol is kicked in when brief = NULL and a simple brief will then be automatically generated.

See Also

The analogue to this function: col_vals_equal().

Other Validation Step Functions: col_exists(), col_is_character(), col_is_date(), col_is_factor(), col_is_integer(), col_is_logical(), col_is_numeric(), col_is_posix(), col_vals_between(), col_vals_equal(), col_vals_gte(), col_vals_gt(), col_vals_in_set(), col_vals_lte(), col_vals_lt(), col_vals_not_between(), col_vals_not_in_set(), col_vals_not_null(), col_vals_null(), col_vals_regex(), conjointly(), rows_distinct()

Examples

Run this code
# NOT RUN {
library(dplyr)

# Create a simple table with two
# columns of numerical values
tbl <-
  tibble(
    a = c(1, 1, 1, 2, 2, 2),
    b = c(5, 5, 5, 3, 6, 3)
  )

# Validate that values in
# column `b` are not equal to 5
# when values in column `a`
# are equal to 2 
agent <-
  create_agent(tbl = tbl) %>%
  col_vals_not_equal(vars(b), 5,
    preconditions = ~ tbl %>% dplyr::filter(a == 2)
  ) %>%
  interrogate()

# Determine if this column
# validation has passed by using
# `all_passed()`
all_passed(agent)

# }

Run the code above in your browser using DataLab