duplicate_count_colpair: Count duplicate values by column
Description
duplicate_count_colpair() takes a data frame and checks each combination of
columns for duplicates. Results are presented in a tibble, ordered by the
number of duplicates.
x and y: Each line contains a unique combination of data's columns,
stored in the x and y output columns.
count: Number of "duplicates", i.e., values that are present in both x
and y.
total_x, total_y, rate_x, and rate_y (added by default): total_x
is the number of non-missing values in the column named under x. Also,
rate_x is the proportion of x values that are duplicated in y, i.e.,
count / total_x. Likewise with total_y and rate_y. The two rate_*
columns will be equal unless NA values are present.
Arguments
data
Data frame.
ignore
Optionally, a vector of values that should not be checked for
duplicates.
show_rates
Logical. If TRUE (the default), adds columns rate_x and
rate_y. See value section. Set show_rates to FALSE for higher
performance.
Summaries with <code>audit()</code>
There is an S3 method for audit(),
so you can call audit() following duplicate_count_colpair(). It
returns a tibble with summary statistics.
See Also
duplicate_count() for a frequency table.
duplicate_tally() to show instances of a value next to each instance.
janitor::get_dupes() to search for duplicate rows.
corrr::colpair_map(), a versatile tool for pairwise column analysis which
the present function wraps.