validate (version 0.9.3)

cells: Cell counts and differences for a series of datasets

Description

Cell counts and differences for a series of datasets

Usage

cells(..., .list = NULL, compare = c("to_first", "sequential"))

Arguments

...

For cells: data frames, comma separated. Names will become column names in the output. For plot or barplot: graphical parameters (see par).

.list

A list of data frames; will be concatenated with objects in ...

compare

How to compare the datasets.

Value

An object of class cellComparison, which is really an array with a few extra attributes. It counts the total number of cells, the number of missings, the number of altered values and changes therein as compared to the reference defined in how.

Comparing datasets cell by cell

When comparing the contents of two data sets, the total number of cells in the current data set can be partitioned as in the following figure.

rulewise splitting

This function computes the partition for two or more datasets, comparing the current set to the first (default) or to the previous (by setting compare='sequential').

Details

This function assumes that the datasets have the same dimensions and that both rows and columns are ordered similarly.

References

The figure is reproduced from MPJ van der Loo and E. De Jonge (2018) Statistical Data Cleaning with applications in R (John Wiley & Sons).

See Also

Other comparing: as.data.frame,cellComparison-method, as.data.frame,validatorComparison-method, barplot,cellComparison-method, barplot,validatorComparison-method, compare(), match_cells(), plot,cellComparison-method, plot,validatorComparison-method

Examples

Run this code
# NOT RUN {
data(retailers)

# start with raw data
step0 <- retailers

# impute turnovers
step1 <- step0
step1$turnover[is.na(step1$turnover)] <- mean(step1$turnover,na.rm=TRUE)

# flip sign of negative revenues
step2 <- step1
step2$other.rev <- abs(step2$other.rev)
  

# create an overview of differences, comparing to the previous step
cells(raw = step0, imputed = step1, flipped = step2, compare="sequential")

# create an overview of differences compared to raw data
out <- cells(raw = step0, imputed = step1, flipped = step2)
out

# Graphical overview of the changes
plot(out)
barplot(out)

# transform data to data.frame (easy for use with ggplot)
as.data.frame(out)


# }

Run the code above in your browser using DataLab