Data Validation Infrastructure
Data often contain errors and missing data. A necessary step before data
analysis is verifying and validating your data. Package validate
is a
toolbox for creating validation rules and checking data against these rules.
The easiest way to get started is through the examples given in check_that
.
The general workflow in validate
follows the following pattern.
Define a set of rules or quality indicator using validator
or indicator
.
confront
data with the rules or indicators,
Examine the results either graphically or by summary.
There are several convenience functions that allow one to define rules from the commandline, through a (freeform or yaml) file and to investigate and maintain the rules themselves. Please have a look at the introductory vignette for a more thorough introduction on validation rules and the indicators vignette for an introducion on quality indicators. After you're a bit aqcuinted with the package, you will probably be interested in defining your rules separately in a text file. The vignette on rule files will get you started with that.
An overview of this package, its underlying ideas and many examples can be found in MPJ van der Loo and E. de Jonge (2018) Statistical data cleaning with applications in R John Wiley & Sons.