Learn R Programming

dataCompareR (version 0.1.4)

Compare Two Data Frames and Summarise the Difference

Description

Easy comparison of two tabular data objects in R. Specifically designed to show differences between two sets of data in a useful way that should make it easier to understand the differences, and if necessary, help you work out how to remedy them. Aims to offer a more useful output than all.equal() when your two data sets do not match, but isn't intended to replace all.equal() as a way to test for equality.

Copy Link

Version

Install

install.packages('dataCompareR')

Monthly Downloads

568

Version

0.1.4

License

Apache License 2.0 | file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Sarah Johnston

Last Published

November 23rd, 2021

Functions in dataCompareR (0.1.4)

cleanColNames

cleanColNames : get colnames, remove leading and trailing whitespace and push to upper case
checkNA

CheckNA
compareNames

compareNames : compare the intersect of colInfoA and colInfoB and return boolean of matched columns for each data frame
checkUniqueness

Checks that a list of indexes areunique
createRowMatching

function for updating a compare object with information passed to it from the match rows function
outputSectionHeader

outputSectionHeader: creates an outputSectionHeader
createAntiSubset

Create a dataframe of the rows that don't match
prepareData

prepareData Prepares data for comparison in 3 stages. 1. Match columns - filter dataframes to those columns that match and summarise differences 2. Match rows - filter dataframes to those rows that match and summarise differences 3. Coerce data
subsetDataColumns

subsetDataColumns : create subset of DFA and DFB to contain matching column names for both data frames
matchMultiIndex

Generate two dataframes that contain the same rows based on a two-column index
rCompare

Compare two data frames
matchColumns

matchColumns : create subset of DFA and DFB to contain matching column names for both data frames
processFlow

processFlow Handles the process flow for the whole package
saveReport

Save a report based on a dataCompareR object
createMismatchObject

Create mismatch object
updateCompareObject

Generic function for updating a compare object with information passed to it, that has methods based on the class of the info argument.
createMismatches

Create mismatch object
coerceFactorsToChar

coerceFactorsToChar: convert all factor type fields to characters
validateArguments

validateArguments
collapseClasses

collapseClasses. Collapse the classes of an object to a single string
createCleaningInfo

Converts cleaning info into a format consumable by updateCompareObject.
generateMismatchData

Extract data from a dataCompareR comparison
createColMatching

Converts the output of the column matching logic to something consumable by updateCompareObject.
updateCompareObject.rowmatching

Adds a rowMatching block to the output
executeCoercions

executeCoercions:
isSingleNA

isSingleNA
listObsNotVerbose

listObsNotVerbose
locateMismatches

Checks whether elements in two input data frames are equal.
orderColumns

orderColumns: order columns by treated column names
currentObjVersion

Place to store and access the current object version.
trimCharVars

trimCharVars: trim white spaces in character variables from an input dataframe
listObsVerbose

listObsVerbose
mismatchHighStop

mismatchHighStop Checks if we've exceeded threshold of mismatches
summary.dataCompareRobject

Summarizing RCompare Output
createTextSummary

createTextSummary: create a text based summary of an dataCompareR object
updateCompareObject.cleaninginfo

Updates cleaning info in the compare object
updateCompareObject.meta

Takes raw info for meta and adds it to the compare object
print.dataCompareRobject

Printing RCompare Output
checkEmpty

checkEmpty
getMismatchColNames

Extracts the column names only in one data frame from a table of match information
allVarMatchMessage

allVarMatchMessage
getCoercions

Subsets on the variables that have a coercion.
matchNoIndex

Generate two dataframes that contain the same rows based on a two-column index
updateCompareObject.mismatches

Adds a colMatching block to the output
rcompObjItemLength

rcompObjItemLength: return length of an item, returning 0 if null, and handling the fact that we might have a data frames or a vector
rounddf

Round all numeric fields in a data frame
print.summary.dataCompareRobject

Printing summaryRCompare Output
checkForRCompareCol

checkForRcompareCol
updateCompareObject.colmatching

Adds a colMatching block to the output
matchRows

Generate two dataframes and returns subsets of these dataframes that have shared rows.
checkKeysExist

checkKeysExist
compareData

Compare data. Wrapper for comparison functionality.
colsWithUnequalValues

colsWithUnequalValues: a dataframe summarising a column with unequal values
is.dataCompareRobject

Check object is of class dataCompareRobject
isNotNull

isNotNull: is object not null
makeValidKeys

makeValidKeys
updateCompareObject.matches

Adds a colMatching block to the output
makeValidNames

makeValidNames
warnLargeData

Warn users if the calculation is likely to be slow
variableMismatches

Create variable mismatch table
matchSingleIndex

Generate two dataframes that contain the same rows based on a single index
validateData

validateData : routine to validate the input data
metaDataInfo

Creates a list of info about the dataframe.
variableDetails

Create variable mismatch details
createReportText

createReportText: prepares text which is used in the summary report Saves R markdown and HTML reports in the area specified by the user. Reports are called RcompareReport.Rmd (.html) Uses knitr package to create tables in the markdown (createReportText function) and HTML report.
createCompareObject

Generates an empty list of the correct class to store results
coerceData

coerceData
createMeta

Takes the raw info for the meta block of the output and puts it in a format usable by the updateCompareObject function