stri_duplicated: Determine Duplicated Elements

Description

stri_duplicated() determines which strings in a character vector are duplicates of other elements.

stri_duplicated_any() determines if there are any duplicated strings in a character vector.

Usage

stri_duplicated(str, fromLast = FALSE, opts_collator = NULL)
stri_duplicated_any(str, fromLast = FALSE, opts_collator = NULL)

Arguments

str

character vector

fromLast

single logical value; indicating whether duplication should be considered from the reverse side

opts_collator

a named list with ICU Collator's options as generated with stri_opts_collator, NULL for default collation options

Value

stri_duplicated() returns a logical vector of the same length as str. Each of its elements indicates if an equivalent string already appeared in str.
stri_duplicated_any() returns a single non-negative integer. Value of 0 indicates that all the elements in str are unique. Otherwise, it gives the index of the first non-unique element.

Details

Missing values are regarded as equal.

Unlike duplicated and anyDuplicated, these functions test for canonical equivalence of strings (and not whether the strings are just bytewise equal) Such operations is locale-dependent. Hence, stri_duplicated and stri_duplicated_any are significantly slower (but much better suited for natural language processing) than their base R counterpart.

See also stri_unique for extracting unique elements.

Other locale_sensitive: %s!==%, %s!=%, %s<=%< a="">, %s<%< a="">, %s===%, %s==%, %s>=%, %s>%, %stri!==%, %stri!=%, %stri<=%< a="">, %stri<%< a="">, %stri===%, %stri==%, %stri>=%, %stri>%; stri_cmp, stri_cmp_eq, stri_cmp_equiv, stri_cmp_ge, stri_cmp_gt, stri_cmp_le, stri_cmp_lt, stri_cmp_neq, stri_cmp_nequiv, stri_compare; stri_count_boundaries, stri_count_words; stri_enc_detect2; stri_extract_words; stri_locate_boundaries, stri_locate_words; stri_opts_collator; stri_order, stri_sort; stri_split_boundaries; stri_trans_tolower, stri_trans_totitle, stri_trans_toupper; stri_unique; stri_wrap; stringi-locale; stringi-search-boundaries; stringi-search-coll

Examples

Run this code

# In the following examples, we have 3 duplicated values,
# "a" - 2 times, NA - 1 time
stri_duplicated(c("a", "b", "a", NA, "a", NA))
stri_duplicated(c("a", "b", "a", NA, "a", NA), fromLast=TRUE)
stri_duplicated_any(c("a", "b", "a", NA, "a", NA))

# compare the results:
stri_duplicated(c("\u0105", stri_trans_nfkd("\u0105")))
duplicated(c("\u0105", stri_trans_nfkd("\u0105")))

stri_duplicated(c("gro\\u00df", "GROSS", "Gro\\u00df", "Gross"),
   opts_collator=stri_opts_collator(strength=1))
duplicated(c("gro\\u00df", "GROSS", "Gro\\u00df", "Gross"))

Run the code above in your browser using DataLab