validate (version 0.2.6)

validate: Data Validation Infrastructure

Description

Data Validation Infrastructure

Arguments

Introduction

Data often contain errors and missing data. A necessary step before data analysis is verifying and validating your data. Package validate is a toolbox for creating validation rules and checking data against these rules.

Getting started

The easiest way to get started is through the examples given in check_that.

The general workflow in validate follows the following pattern.

  • Define a set of rules or quality indicator using validator or indicator.

  • confront data with the rules or indicators,

  • Examine the results either graphically or by summary.

There are several convenience functions that allow one to define rules from the commandline, through a (freeform or yaml) file and to investigate and maintain the rules themselves. Please have a look at the introductory vignette for a more thorough introduction on validation rules and the indicators vignette for an introducion on quality indicators. After you're a bit aqcuinted with the package, you will probably be interested in defining your rules separately in a text file. The vignette on rule files will get you started with that.

References

An overview of this package, its underlying ideas and many examples can be found in MPJ van der Loo and E. de Jonge (2018) Statistical data cleaning with applications in R John Wiley & Sons.