vec_duplicate: Find duplicated values

Description

vec_duplicate_any(): detects the presence of any duplicated values, in the same way as anyDuplicated().
vec_duplicate_detect(): returns a logical vector describing if each element of the vector is duplicated elsewhere. Unlike duplicated(), it reports all duplicated values, not just the second and subsequent repetitions.
vec_duplicate_id(): returns an integer vector given the location of the first occurence of the value

Usage

vec_duplicate_any(x)
vec_duplicate_detect(x)
vec_duplicate_id(x)

Arguments

A vector (including a data frame).

Value

vec_duplicate_any(): a logical vector of length 1.
vec_duplicate_detect(): a logical vector the same length as x
vec_duplicate_id(): an integer vector the same length as x

Missing values

In most cases, missing values are not considered to be equal, i.e. NA == NA is not TRUE. This behaviour would be unappealing here, so these functions consider all NAs to be equal. (Similarly, all NaN are also considered to be equal.)

Performance

These functions are currently slightly slower than their base equivalents. This is primarily because they do a little more checking and coercion in R, which makes them both a litter safer and more generic. Additionally, the C code underlying vctrs has not yet been implemented: we expect some performance improvements when that happens.

Examples

Run this code

# NOT RUN {
vec_duplicate_any(1:10)
vec_duplicate_any(c(1, 1:10))

x <- c(10, 10, 20, 30, 30, 40)
vec_duplicate_detect(x)
# Note that `duplicated()` doesn't consider the first instance to
# be a duplicate
duplicated(x)

# Identify elements of a vector by the location of the first element that
# they're equal to:
vec_duplicate_id(x)
# Location of the unique values:
vec_unique_loc(x)
# Equivalent to `duplicated()`:
vec_duplicate_id(x) == seq_along(x)
# }

Run the code above in your browser using DataLab

Last chance! 50% off unlimited learning