Learn R Programming

manydata (version 1.1.3)

compare_diff: Compare two datasets for differences

Description

Compare two datasets for differences

Usage

compare_new(.data1, .data2, by = "ID")

compare_diff( .data1, .data2, by = "ID", exclude = c("Title", "Coder", "Comments"), diff_threshold = 0 )

Value

A data frame with the differences found

Arguments

.data1

First dataset to compare

.data2

Second dataset to compare

by

Column name to join on (default is "ID")

exclude

Character vector of column names to exclude from comparison. By default, "Title", "Coder", and "Comments" are excluded.

diff_threshold

Integer specifying the minimum number of differing columns for a row to be included in the output. Default is 0, meaning any difference will be included. Set to 3 to only show rows with at least 3 differing columns.

Details

This function uses dplyr::anti_join to find rows in .data1 that are not present in .data2. If no differences are found, a message is printed and NULL is returned. If differences are found, they are returned as a data frame.

Examples

Run this code
if (FALSE) {
df1 <- data.frame(ID = 1:5, Value = letters[1:5])
df2 <- data.frame(ID = 3:7, Value = letters[3:7])
compare_new(df1, df2)
compare_new(df1, df1)
}
compare_diff(emperors$Wikipedia, emperors$Britannica)

Run the code above in your browser using DataLab