NULL/NA?The col_vals_null() validation step function checks whether column values
(in any number of specified columns) are NA values or, in the database
context, NULL values. This function can be used directly on a data table or
with an agent object (technically, a ptblank_agent object). Each
validation step will operate over the number of test units that is equal to
the number of rows in the table (after any preconditions have been
applied).
col_vals_null(x, columns, preconditions = NULL, actions = NULL, brief = NULL)A data frame, tibble, or an agent object of class ptblank_agent.
The column (or a set of columns, provided as a character vector) to which this validation should be applied.
expressions used for mutating the input table before
proceeding with the validation. This is ideally as a one-sided R formula
using a leading ~. In the formula representation, the tbl serves as the
input data table to be transformed (e.g.,
~ tbl %>% dplyr::mutate(col = col + 10). A series of expressions can be
used by enclosing the set of statements with { } but note that the tbl
object must be ultimately returned.
A list containing threshold levels so that the validation step
can react accordingly when exceeding the set levels. This is to be created
with the action_levels() helper function.
An optional, text-based description for the validation step.
Either a ptblank_agent object or a table object, depending on what
was passed to x.
2-11
If providing multiple column names, the result will be an expansion of
validation steps to that number of column names (e.g., vars(col_a, col_b)
will result in the entry of two validation steps). Aside from column names
in quotes and in vars(), tidyselect helper functions are available for
specifying columns. They are: starts_with(), ends_with(), contains(),
matches(), and everything().
Having table preconditions means pointblank will mutate the table just
before interrogation. It's isolated to the validation steps produced by this
validation step function. Using dplyr code is suggested here since the
statements can be translated to SQL if necessary. The code is to be supplied
as a one-sided R formula (using a leading ~). In the formula
representation, the obligatory tbl variable will serve as the input
data table to be transformed (e.g.,
~ tbl %>% dplyr::mutate(col_a = col_b + 10). A series of expressions can be
used by enclosing the set of statements with { } but note that the tbl
variable must be ultimately returned.
Often, we will want to specify actions for the validation. This argument,
present in every validation step function, takes a specially-crafted list
object that is best produced by the action_levels() function. Read that
function's documentation for the lowdown on how to create reactions to
above-threshold failure levels in validation. The basic gist is that you'll
want at least a single threshold level (specified as either the fraction test
units failed, or, an absolute value), often using the warn_at argument.
This is especially true when x is a table object because, otherwise,
nothing happens. For the col_vals_*()-type functions, using
action_levels(warn_at = 0.25) or action_levels(stop_at = 0.25) are good
choices depending on the situation (the first produces a warning when a
quarter of the total test units fails, the other stop()s at the same
threshold level).
Want to describe this validation step in some detail? Keep in mind that this
is only useful if x is an agent. If that's the case, brief the agent
with some text that fits. Don't worry if you don't want to do it. The
autobrief protocol is kicked in when brief = NULL and a simple brief will
then be automatically generated.
The analogue to this function: col_vals_not_null().
Other Validation Step Functions:
col_exists(),
col_is_character(),
col_is_date(),
col_is_factor(),
col_is_integer(),
col_is_logical(),
col_is_numeric(),
col_is_posix(),
col_vals_between(),
col_vals_equal(),
col_vals_gte(),
col_vals_gt(),
col_vals_in_set(),
col_vals_lte(),
col_vals_lt(),
col_vals_not_between(),
col_vals_not_equal(),
col_vals_not_in_set(),
col_vals_not_null(),
col_vals_regex(),
conjointly(),
rows_distinct()
# NOT RUN {
library(dplyr)
# Create a simple table with two
# columns of numerical values
tbl <-
tibble(
a = c(1, 2, NA, NA),
b = c(2, 2, 5, 5)
)
# Validate that all values in
# column `a` are NULL when
# values in column `b` are
# equal to 5
agent <-
create_agent(tbl = tbl) %>%
col_vals_null(vars(a),
preconditions = ~ tbl %>% dplyr::filter(b >= 5)
) %>%
interrogate()
# Determine if these column
# validations have all passed
# by using `all_passed()`
all_passed(agent)
# }
Run the code above in your browser using DataLab