assertr v2.7
Monthly downloads
Assertive Programming for R Analysis Pipelines
Provides functionality to assert conditions
that have to be met so that errors in data used in
analysis pipelines can fail quickly. Similar to
'stopifnot()' but more powerful, friendly, and easier
for use in pipelines.
Readme
assertr
What is it?
The assertr package supplies a suite of functions designed to verify assumptions about data early in an analysis pipeline so that data errors are spotted early and can be addressed quickly.
This package does not need to be used with the magrittr/dplyr piping mechanism but the examples in this README use them for clarity.
Installation
You can install the latest version on CRAN like this
install.packages("assertr")
or you can install the bleedingedge development version like this:
install.packages("devtools")
devtools::install_github("ropensci/assertr")
What does it look like?
This package offers five assertion functions, assert
, verify
,
insist
, assert_rows
, and insist_rows
, that are designed to be used
shortly after dataloading in an analysis pipeline...
Let’s say, for example, that the R’s builtin car dataset, mtcars
, was not
builtin but rather procured from an external source that was known for making
errors in data entry or coding. Pretend we wanted to find the average
miles per gallon for each number of engine cylinders. We might want to first,
confirm
 that it has the columns "mpg", "vs", and "am"
 that the dataset contains more than 10 observations
 that the column for 'miles per gallon' (mpg) is a positive number
 that the column for ‘miles per gallon’ (mpg) does not contain a datum that is outside 4 standard deviations from its mean, and
 that the am and vs columns (automatic/manual and v/straight engine, respectively) contain 0s and 1s only
 each row contains at most 2 NAs
 each row is unique jointly between the "mpg", "am", and "wt" columns
 each row's mahalanobis distance is within 10 median absolute deviations of all the distances (for outlier detection)
This could be written (in order) using assertr
like this:
library(dplyr)
library(assertr)
mtcars %>%
verify(has_all_names("mpg", "vs", "am", "wt")) %>%
verify(nrow(.) > 10) %>%
verify(mpg > 0) %>%
insist(within_n_sds(4), mpg) %>%
assert(in_set(0,1), am, vs) %>%
assert_rows(num_row_NAs, within_bounds(0,2), everything()) %>%
assert_rows(col_concat, is_uniq, mpg, am, wt) %>%
insist_rows(maha_dist, within_n_mads(10), everything()) %>%
group_by(cyl) %>%
summarise(avg.mpg=mean(mpg))
If any of these assertions were violated, an error would have been raised and the pipeline would have been terminated early.
Let's see what the error message look like when you chain a bunch of failing assertions together.
> mtcars %>%
+ chain_start %>%
+ assert(in_set(1, 2, 3, 4), carb) %>%
+ assert_rows(rowMeans, within_bounds(0,5), gear:carb) %>%
+ verify(nrow(.)==10) %>%
+ verify(mpg < 32) %>%
+ chain_end
There are 7 errors across 4 verbs:

verb redux_fn predicate column index value
1 assert <NA> in_set(1, 2, 3, 4) carb 30 6.0
2 assert <NA> in_set(1, 2, 3, 4) carb 31 8.0
3 assert_rows rowMeans within_bounds(0, 5) ~gear:carb 30 5.5
4 assert_rows rowMeans within_bounds(0, 5) ~gear:carb 31 6.5
5 verify <NA> nrow(.) == 10 <NA> 1 NA
6 verify <NA> mpg < 32 <NA> 18 NA
7 verify <NA> mpg < 32 <NA> 20 NA
Error: assertr stopped execution
What does assertr
give me?
verify
 takes a data frame (its first argument is provided by the%>%
operator above), and a logical (boolean) expression. Then,verify
evaluates that expression using the scope of the provided data frame. If any of the logical values of the expression's result areFALSE
,verify
will raise an error that terminates any further processing of the pipeline.assert
 takes a data frame, a predicate function, and an arbitrary number of columns to apply the predicate function to. The predicate function (a function that returns a logical/boolean value) is then applied to every element of the columns selected, and will raise an error if it finds any violations. Internally, theassert
function usesdplyr
'sselect
function to extract the columns to test the predicate function on.insist
 takes a data frame, a predicategenerating function, and an arbitrary number of columns. For each column, the the predicategenerating function is applied, returning a predicate. The predicate is then applied to every element of the columns selected, and will raise an error if it finds any violations. The reason for using a predicategenerating function to return a predicate to use against each value in each of the selected rows is so that, for example, bounds can be dynamically generated based on what the data look like; this the only way to, say, create bounds that check if each datum is within x zscores, since the standard deviation isn't known a priori. Internally, theinsist
function usesdplyr
'sselect
function to extract the columns to test the predicate function on.assert_rows
 takes a data frame, a row reduction function, a predicate function, and an arbitrary number of columns to apply the predicate function to. The row reduction function is applied to the data frame, and returns a value for each row. The predicate function is then applied to every element of vector returned from the row reduction function, and will raise an error if it finds any violations. This functionality is useful, for example, in conjunction with thenum_row_NAs()
function to ensure that there is below a certain number of missing values in each row. Internally, theassert_rows
function usesdplyr
'sselect
function to extract the columns to test the predicate function on.insist_rows
 takes a data frame, a row reduction function, a predicategenerating function, and an arbitrary number of columns to apply the predicate function to. The row reduction function is applied to the data frame, and returns a value for each row. The predicategenerating function is then applied to the vector returned from the row reduction function and the resultant predicate is applied to each element of that vector. It will raise an error if it finds any violations. This functionality is useful, for example, in conjunction with themaha_dist()
function to ensure that there are no flagrant outliers. Internally, theassert_rows
function usesdplyr
'sselect
function to extract the columns to test the predicate function on.
assertr
also offers four (so far) predicate functions designed to be used
with the assert
and assert_rows
functions:
not_na
 that checks if an element is not NAwithin_bounds
 that returns a predicate function that checks if a numeric value falls within the bounds supplied, andin_set
 that returns a predicate function that checks if an element is a member of the set supplied.is_uniq
 that checks to see if each element appears only once
and predicate generators designed to be used with the insist
and insist_rows
functions:
within_n_sds
 used to dynamically create bounds to check vector elements with based on standard zscoreswithin_n_mads
 better method for dynamically creating bounds to check vector elements with based on 'robust' zscores (using median absolute deviation)
and the following row reduction functions designed to be used with assert_rows
and insist_rows
:
num_row_NAs
 counts number of missing values in each rowmaha_dist
 computes the mahalanobis distance of each row (for outlier detection). It will coerce categorical variables into numerics if it needs to.col_concat
 concatenates all rows into strings
More info
For more info, check out the assertr
vignette
> vignette("assertr")
Or read it here
Functions in assertr
Name  Description  
summary.assertr_assert_error  Summarizing assertr's assert errors  
summary.assertr_verify_error  Summarizing assertr's verify errors  
within_n_sds  Return a function to create zscore checking predicate  
within_bounds  Creates bounds checking predicate  
verify  Raises error if expression is FALSE anywhere  
within_n_mads  Return a function to create robust zscore checking predicate  
num_row_NAs  Counts number of NAs in each row  
print.assertr_assert_error  Printing assertr's assert errors  
maha_dist  Computes mahalanobis distance for each row of data frame  
is_uniq  Returns TRUE where no elements appear more than once  
print.assertr_verify_error  Printing assertr's verify errors  
not_na  Returns TRUE if value is not NA  
success_and_error_functions  Success and error functions  
insist_rows  Raises error if dynamically created predicate is FALSE for any row after applying row reduction function  
assert  Raises error if predicate is FALSE in any columns selected  
assert_rows  Raises error if predicate is FALSE for any row after applying row reduction function  
in_set  Returns TRUE if value in set  
col_concat  Concatenate all columns of each row in data frame into a string  
has_all_names  Returns TRUE if data.frame or list has specified names  
chaining_functions  Chaining functions  
assertr  assertr: Assertive programming for R analysis pipeline.  
insist  Raises error if dynamically created predicate is FALSE in any columns selected  
No Results! 
Vignettes of assertr
Name  
assertr.Rmd  
No Results! 
Last month downloads
Details
Type  Package 
URL  https://docs.ropensci.org/assertr (website) https://github.com/ropensci/assertr 
BugReports  https://github.com/ropensci/assertr/issues 
License  MIT + file LICENSE 
ByteCompile  TRUE 
LazyData  TRUE 
VignetteBuilder  knitr 
RoxygenNote  7.0.2 
Encoding  UTF8 
NeedsCompilation  no 
Packaged  20200205 21:17:23 UTC; tonyfischetti 
Repository  CRAN 
Date/Publication  20200205 22:10:02 UTC 
imports  dplyr (>= 0.7.0) , MASS , rlang (>= 0.3.0) , stats , utils 
suggests  knitr , magrittr , testthat 
depends  R (>= 3.1.0) 
Contributors 
Include our badge in your README
[![Rdoc](http://www.rdocumentation.org/badges/version/assertr)](http://www.rdocumentation.org/packages/assertr)