Learn R Programming

validate (version 0.1.4)

syntax: Syntax to define validation or indicator rules

Description

The functions mentioned in this help file should only be used in the context of defining a validator or indicator object.

Usage

number_missing(...)

fraction_missing(...)

row_missing(...)

col_missing(...)

number_unique(...)

any_missing(...)

any_duplicated(...)

Arguments

...
comma-separated list of variable names (not character). If no variables are specified, the number of missings over all data is counted.

Value

  • For number_missing, the total number of missings over all specified variables.

    For fraction_missing, the fraction of missings over all specified variables

    For row_missing a vector with the number of missings per (sub)record defined by ....

    For col_missing a vector with the number of missings per column defined by ....

    For number_unique the number of records, unique for the columns specified in ....

    For any_missing, TRUE if any NA occur in the columns specified in ....

    For any_duplicated, TRUE if any (sub)records specified by ... are duplicated, FALSE otherwise. Note that NA is matched with NA.

Note

This document only provides a short reference. Please refer to the vignette for worked examples.

vignette("intro",package="validate")

Local, transient assignment

The operator `:=' can be used to set up local variables (during, for example, validation) to save time (the rhs of an assignment is computed only once) or to make your validation code more maintainable. Assignments work more or less like common R assignments: they are only valid for statements coming after the assignment and they may be overwritten. The result of computing the rhs is not part of a confrontation with data.

Groups

Often the same constraints/rules are valid for groups of variables. validate allows for compact notation. Variable groups can be used in-statement or by defining them with the := operator.

validator( var_group(a,b) > 0 )

is equivalent to

validator(G := var_group(a,b), G > 0)

is equivalent to

validator(a>0,b>0).

Using two groups results in the cartesian product of checks. So the statement

validator( f=var_group(c,d), g=var_group(a,b), g > f)

is equivalent to

validator(a > c, b > c, a > d, b > d)

File parsing

Please see the vignette on how to read rules from and write rules to file:

vignette("rule-files",package="validate")