Learn R Programming

scrutiny (version 0.5.0)

duplicate_tally: Count duplicates at each observation

Description

For every value in a vector or data frame, duplicate_tally() counts how often it appears in total. Tallies are presented next to each value.

For summary statistics, call audit() on the results.

Usage

duplicate_tally(x, ignore = NULL, colname_end = "n")

Value

A tibble (data frame). It has all the columns from x, and to each of these columns' right, the corresponding tally column.

The tibble has the scr_dup_detect class, which is recognized by the audit() generic.

Arguments

x

Vector or data frame.

ignore

Optionally, a vector of values that should not be checked. In the test result columns, they will be marked NA.

colname_end

String. Name ending of the logical test result columns. Default is "n".

Summaries with <code>audit()</code>

There is an S3 method for the audit() generic, so you can call audit() following duplicate_tally(). It returns a tibble with summary statistics.

Details

This function is not very informative with many input values that only have a few characters each. Many of them may have duplicates just by chance. For example, in R's built-in iris data set, 99% of values have duplicates.

In general, the fewer values and the more characters per value, the more significant the results.

See Also

  • duplicate_count() for a frequency table.

  • duplicate_count_colpair() to check each combination of columns for duplicates.

  • janitor::get_dupes() to search for duplicate rows.

Examples

Run this code
# Tally duplicate values in a data frame...
duplicate_tally(x = pigs4)

# ...or in a single vector:
duplicate_tally(x = pigs4$snout)

# Summary statistics with `audit()`:
pigs4 %>%
  duplicate_tally() %>%
  audit()

# Any values can be ignored:
pigs4 %>%
  duplicate_tally(ignore = c(8.131, 7.574))

Run the code above in your browser using DataLab