validate (version 0.2.6)

confront: Confront data with a (set of) expressionset(s)

Description

An expressionset is a general class storing rich expressions (basically expressions and some meta data) which we call 'rules'. Examples of expressionset implementations are validator objects, storing validation rules and indicator objects, storing data quality indicators. The confront function evaluates the expressions one by one on a dataset while recording some process meta data. All results are stored in a (subclass of a) confrontation object.

Usage

confront(dat, x, ref, ...)

# S4 method for data.frame,indicator,ANY confront(dat, x, key = NA_character_, ...)

# S4 method for data.frame,indicator,environment confront(dat, x, ref, key = NA_character_, ...)

# S4 method for data.frame,indicator,data.frame confront(dat, x, ref, key = NA_character_, ...)

# S4 method for data.frame,indicator,list confront(dat, x, ref, key = NA_character_, ...)

# S4 method for data.frame,validator,ANY confront(dat, x, key = NA_character_, ...)

# S4 method for data.frame,validator,environment confront(dat, x, ref, key = NA_character_, ...)

# S4 method for data.frame,validator,data.frame confront(dat, x, ref, key = NA_character_, ...)

# S4 method for data.frame,validator,list confront(dat, x, ref, key = NA_character_, ...)

Arguments

dat

An R object carrying data

x

An R object carrying rules.

ref

Optionally, an R object carrying reference data. See examples for usage.

...

Options used at execution time (especially 'raise'). See voptions.

key

(optional) name of identifying variable in x.

Using reference data

When reference data sets are given, it is assumed that rows in the reference data are ordered corresponding to the rows of dat, except when a key is specified. In that case, all reference datasets are matched against the rows of dat using key Nonmatching records are removed from datasets in ref. If there are records in dat that are not in ref, then datasets in ref are extended with records containing only NA. In particular, this means that wen reference data is passed in an environment, those reference data sets may altered by the call to confront.

Technically, reference data will be stored in an environment that is the parent of a (created) environment that contains the columns of dat.

See Also

voptions

Other confrontation-methods: [,expressionset-method, as.data.frame,confrontation-method, confrontation-class, errors, length,expressionset-method, values

Other validation-methods: aggregate,validation-method, all,validation-method, any,validation-method, barplot,validation-method, check_that, compare, plot,validation-method, sort,validation-method, summary, validation-class, values

Other indication-methods: indication-class, summary

Examples

Run this code
# NOT RUN {
# a basic validation example
v <- validator(height/weight < 0.5, mean(height) >= 0)
cf <- confront(women, v)
summary(cf)
plot(cf)
as.data.frame(cf)

# an example checking metadata
v <- validator(nrow(.) == 15, ncol(.) > 2)
summary(confront(women, v))

# An example using reference data
v <- validator(weight == ref$weight)
summary(confront(women, v, women))

# Usging custom names for reference data
v <- validator(weight == test$weight)
summary( confront(women,v, list(test=women)) )

# Reference data in an environment
e <- new.env()
e$test <- women
v <- validator(weight == test$weight)
summary( confront(women, v, e) )

# the effect of using a key
w <- women
w$id <- letters[1:nrow(w)]
v <- validator(weight == ref$weight)

# with complete data; already matching
values( confront(w, v, w, key='id'))

# with scrambled rows in reference data (reference gets sorted according to dat)
i <- sample(nrow(w))
values(confront(w, v, w[i,],key='id'))

# with incomplete reference data
values(confront(w, v, w[1:10,],key='id'))


# }

Run the code above in your browser using DataLab