Learn R Programming

datacheck (version 1.0.0)

datadict.profile: Create a data quality profile (main function)

Description

Tests a database against a set of rules (one per line) in a 'data dictionary file'. Rules will be summarized in the returned object: the variable/column, the rule, any comment after the rule, the execution success, the total number of rule violations if any, the record id for any non-compliant records. Rules that can't be executed for any reason will be marked as 'failed'.

Usage

datadict.profile(atable, adictionary)

Arguments

atable
a data.frame
adictionary
a list of rules in rule format

Value

  • a data.profile object or NA

Details

The rule file must be a simple list of one rule per line. Functions can be used but since they are applied on a 'vector' (the column) they should be used within a sapply statement (see example rule file). Rules may be separated by empty lines or lines with comment character #. Comments after a rule within the same line will be used for display in the summary table and should be short. A rule must only test one variable and one aspect at a time.

See Also

Other datadict: as.rules, has.ruleErrors, is.datadict.profile, prep4rep, read.rules

Examples

Run this code
# Get example data files
atable = system.file("examples/db.csv", package='datacheck')
arule  = system.file("examples/rules1.R", package='datacheck')
aloctn = system.file("examples/location.csv", package='datacheck') # for use in is.oneOf

ctable = basename(atable)
crule  = basename(arule)
cloctn = basename(aloctn)

cwd = tempdir()
owd = getwd()
setwd(cwd)

file.copy(atable, ctable)
file.copy(arule,  crule)
file.copy(aloctn, cloctn)

at = read.csv(ctable, stringsAsFactors = FALSE)
ad = read.rules(crule)

db = datadict.profile(at, ad)

is.datadict.profile(db) == TRUE

db

setwd(owd)

Run the code above in your browser using DataLab