Learn R Programming

⚠️There's a newer version (0.12.1) of this package.Take me there.

pointblank

With the pointblank package, it’s really easy to validate your data with workflows attuned to your data quality needs. The pointblank philosophy: a set of validation step functions should work seamlessly with data in local data tables and with data in databases.

The two dominant workflows that pointblank enables are data quality reporting and pipeline-based data validations. Both workflows make use of a large collection of simple validation step functions (e.g., are values in a specific column greater than a fixed, numerical value?), and, both allow for stepwise, temporary mutation/alteration of the input table to enable more sophisticated validation checks.

The first workflow, data quality reporting allows for the easy creation of a DQ analysis report. This is most useful in a non-interactive mode where data quality for database tables and on-disk data files must be periodically checked. The reporting component (through a pointblank agent) allows for the collection of detailed validation measures for each validation step, the optional extraction of data rows that failed validation (with options on limits), and custom actions that are triggered by exceeding threshold failure rates.

The second workflow, pipeline-based data validations gives us a simpler validation scheme that is valuable for data validation checks during an ETL process. With pointblank’s validation step functions, we directly operate on data and trigger warnings, raise errors, or write out logs when exceeding specified failure thresholds. We can perform checks on import of the data, and at key points during the transformation process, perhaps stopping everything if things are exceptionally bad with regard to data quality.

The pointblank package is designed to be both straightforward yet powerful. And fast! All validation checks on remote tables are done entirely in-database so we can add dozens or hundreds of validation steps without any long waits for reporting. Here is a brief example of how to use pointblank to validate a local table with an agent.

library(pointblank)
library(tidyverse)

# Generate a simple `action_levels` object to
# set the `warn` state if a validation step
# has a single fail unit
al <- action_levels(warn_at = 1)

# Create a pointblank `agent` object, with the
# tibble as the target table. Use two validation
# step functions, then, `interrogate()`. The
# agent now has some useful intel.
agent <- 
  dplyr::tibble(
    a = c(5, 7, 6, 5, NA, 7),
    b = c(6, 1, 0, 6,  0, 7)
  ) %>%
  create_agent(name = "simple_tibble") %>%
  col_vals_between(vars(a), 1, 9, na_pass = TRUE, actions = al) %>%
  col_vals_lt(vars(c), 12, preconditions = ~tbl %>% dplyr::mutate(c = a + b), actions = al) %>%
  interrogate()

Because an agent was used, we can get a report from it.

get_agent_report(agent)

Beyond this simple example, there are many functions available in pointblank for making comprehensive table validations.

Want to try this out? You can install the development version of pointblank from GitHub:

remotes::install_github("rich-iannone/pointblank")

If you encounter a bug, have usage questions, or want to share ideas to make this package better, feel free to file an issue.

Code of Conduct

Contributor Code of Conduct.By participating in this project you agree to abide by its terms.

License

MIT © Richard Iannone

Copy Link

Version

Install

install.packages('pointblank')

Monthly Downloads

8,429

Version

0.3.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Richard Iannone

Last Published

January 10th, 2020

Functions in pointblank (0.3.0)

col_is_factor

Do the columns contain R factor objects?
col_vals_in_set

Are column data part of a specific set of values?
col_vals_lt

Are numerical column data less than a specific value?
col_vals_lte

Are numerical column data less than or equal to a specific value?
col_vals_between

Are numerical column data between two specified values?
col_vals_not_equal

Are numerical column data not equal to a specific value?
col_vals_not_in_set

Are data not part of a specific set of values?
col_vals_equal

Are numerical column data equal to a specific value?
col_vals_not_between

Are numerical column data not between two specified values?
col_vals_gt

Are numerical column data greater than a specific value?
print.ptblank_agent

Print the agent information to the console
conjointly

Perform multiple rowwise validations for joint validity
get_agent_report

Get a simple report from an agent
col_vals_regex

Do strings in column data match a regex pattern?
%>%

Pipe operator
col_vals_gte

Are numerical column data greater than or equal to a specific value?
get_data_extracts

Collect data extracts from a validation step
col_vals_not_null

Are column data not NULL/NA?
create_agent

Create a pointblank agent object
col_vals_null

Are column data NULL/NA?
reexports

Objects exported from other packages
rows_not_duplicated

Verify that row data are not duplicated (deprecated)
rows_distinct

Verify that row data are distinct
small_table

A small table that useful for testing
interrogate

Given an agent that has a validation plan, perform an interrogation
col_is_posix

Do the columns contain POSIXct dates?
action_levels

Set action levels for reacting to exceeding thresholds
all_passed

Did all of the validations fully pass?
col_is_numeric

Do the columns contain numeric values?
col_exists

Do one or more columns actually exist?
col_is_date

Do the columns contain R Date objects?
col_is_character

Do the columns contain character/string data?
col_is_integer

Do the columns contain integer values?
col_is_logical

Do the columns contain logical values?