
stri_duplicated()
determines which strings in a character vector
are duplicates of other elements.stri_duplicated_any()
determines if there are any duplicated
strings in a character vector.
stri_duplicated(str, fromLast = FALSE, opts_collator = NULL)stri_duplicated_any(str, fromLast = FALSE, opts_collator = NULL)
stri_opts_collator
, NULL
for default collation optionsstri_duplicated()
returns a logical vector of the same length
as str
. Each of its elements indicates if an equivalent string
already appeared in str
.stri_duplicated_any()
returns a single non-negative integer.
Value of 0 indicates that all the elements in str
are unique.
Otherwise, it gives the index of the first non-unique element.
Unlike duplicated
and anyDuplicated
,
these functions test for canonical equivalence of strings
(and not whether the strings are just bytewise equal)
Such operations is locale-dependent.
Hence, stri_duplicated
and stri_duplicated_any
are significantly slower (but much better suited for natural language
processing) than their base R counterpart.
See also stri_unique
for extracting unique elements.
%s!==%
,
%s!=%
, %s<=%< a="">
,
%s<%< a="">
, %s===%
,
%s==%
, %s>=%
,
%s>%
, %stri!==%
,
%stri!=%
, %stri<=%< a="">
,
%stri<%< a="">
, %stri===%
,
%stri==%
, %stri>=%
,
%stri>%
; stri_cmp
,
stri_cmp_eq
, stri_cmp_equiv
,
stri_cmp_ge
, stri_cmp_gt
,
stri_cmp_le
, stri_cmp_lt
,
stri_cmp_neq
,
stri_cmp_nequiv
,
stri_compare
;
stri_count_boundaries
,
stri_count_words
;
stri_enc_detect2
;
stri_extract_words
;
stri_locate_boundaries
,
stri_locate_words
;
stri_opts_collator
;
stri_order
, stri_sort
;
stri_split_boundaries
;
stri_trans_tolower
,
stri_trans_totitle
,
stri_trans_toupper
;
stri_unique
; stri_wrap
;
stringi-locale
;
stringi-search-boundaries
;
stringi-search-coll
Other locale_sensitive: %s!==%
,
%s!=%
, %s<=%< a="">
,
%s<%< a="">
, %s===%
,
%s==%
, %s>=%
,
%s>%
, %stri!==%
,
%stri!=%
, %stri<=%< a="">
,
%stri<%< a="">
, %stri===%
,
%stri==%
, %stri>=%
,
%stri>%
; stri_cmp
,
stri_cmp_eq
, stri_cmp_equiv
,
stri_cmp_ge
, stri_cmp_gt
,
stri_cmp_le
, stri_cmp_lt
,
stri_cmp_neq
,
stri_cmp_nequiv
,
stri_compare
;
stri_count_boundaries
,
stri_count_words
;
stri_enc_detect2
;
stri_extract_words
;
stri_locate_boundaries
,
stri_locate_words
;
stri_opts_collator
;
stri_order
, stri_sort
;
stri_split_boundaries
;
stri_trans_tolower
,
stri_trans_totitle
,
stri_trans_toupper
;
stri_unique
; stri_wrap
;
stringi-locale
;
stringi-search-boundaries
;
stringi-search-coll
# In the following examples, we have 3 duplicated values,
# "a" - 2 times, NA - 1 time
stri_duplicated(c("a", "b", "a", NA, "a", NA))
stri_duplicated(c("a", "b", "a", NA, "a", NA), fromLast=TRUE)
stri_duplicated_any(c("a", "b", "a", NA, "a", NA))
# compare the results:
stri_duplicated(c("\u0105", stri_trans_nfkd("\u0105")))
duplicated(c("\u0105", stri_trans_nfkd("\u0105")))
stri_duplicated(c("gro\\u00df", "GROSS", "Gro\\u00df", "Gross"),
opts_collator=stri_opts_collator(strength=1))
duplicated(c("gro\\u00df", "GROSS", "Gro\\u00df", "Gross"))
Run the code above in your browser using DataLab