confront: Confront data with a (set of) expressionset(s)

Description

An expressionset is a general class storing rich expressions (basically expressions and some meta data) which we call 'rules'. Examples of expressionset implementations are validator objects, storing validation rules and indicator objects, storing data quality indicators. The confront function evaluates the expressions one by one on a dataset while recording some process meta data. All results are stored in a (subclass of a) confrontation object.

Usage

confront(dat, x, ref, ...)
# S4 method for data.frame,indicator,ANY
confront(dat, x, key = NA_character_,
  ...)
# S4 method for data.frame,indicator,environment
confront(dat, x, ref,
  key = NA_character_, ...)
# S4 method for data.frame,indicator,data.frame
confront(dat, x, ref,
  key = NA_character_, ...)
# S4 method for data.frame,indicator,list
confront(dat, x, ref,
  key = NA_character_, ...)
# S4 method for data.frame,validator,ANY
confront(dat, x, key = NA_character_,
  ...)
# S4 method for data.frame,validator,environment
confront(dat, x, ref,
  key = NA_character_, ...)
# S4 method for data.frame,validator,data.frame
confront(dat, x, ref,
  key = NA_character_, ...)
# S4 method for data.frame,validator,list
confront(dat, x, ref,
  key = NA_character_, ...)

Arguments

dat

An R object carrying data

An R object carrying rules.

ref

Optionally, an R object carrying reference data. See examples for usage.

...

Options used at execution time (especially 'raise'). See voptions.

key

(optional) name of identifying variable in x.

Using reference data

When reference data sets are given, it is assumed that rows in the reference data are ordered corresponding to the rows of dat, except when a key is specified. In that case, all reference datasets are matched against the rows of dat using key Nonmatching records are removed from datasets in ref. If there are records in dat that are not in ref, then datasets in ref are extended with records containing only NA. In particular, this means that wen reference data is passed in an environment, those reference data sets may altered by the call to confront.

Technically, reference data will be stored in an environment that is the parent of a (created) environment that contains the columns of dat.

Examples

Run this code

# NOT RUN {
# a basic validation example
v <- validator(height/weight < 0.5, mean(height) >= 0)
cf <- confront(women, v)
summary(cf)
plot(cf)
as.data.frame(cf)

# an example checking metadata
v <- validator(nrow(.) == 15, ncol(.) > 2)
summary(confront(women, v))

# An example using reference data
v <- validator(weight == ref$weight)
summary(confront(women, v, women))

# Usging custom names for reference data
v <- validator(weight == test$weight)
summary( confront(women,v, list(test=women)) )

# Reference data in an environment
e <- new.env()
e$test <- women
v <- validator(weight == test$weight)
summary( confront(women, v, e) )

# the effect of using a key
w <- women
w$id <- letters[1:nrow(w)]
v <- validator(weight == ref$weight)

# with complete data; already matching
values( confront(w, v, w, key='id'))

# with scrambled rows in reference data (reference gets sorted according to dat)
i <- sample(nrow(w))
values(confront(w, v, w[i,],key='id'))

# with incomplete reference data
values(confront(w, v, w[1:10,],key='id'))


# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Using reference data

See Also

Examples