duplicated()determines which elements of a vector or data frame are duplicates of elements with smaller subscripts, and returns a logical vector indicating which elements (rows) are duplicates.
anyDuplicated(.)is a “generalized” more efficient shortcut for
duplicated(x, incomparables = FALSE, …)
# S3 method for default duplicated(x, incomparables = FALSE, fromLast = FALSE, nmax = NA, …)
# S3 method for array duplicated(x, incomparables = FALSE, MARGIN = 1, fromLast = FALSE, …)
anyDuplicated(x, incomparables = FALSE, …) # S3 method for default anyDuplicated(x, incomparables = FALSE, fromLast = FALSE, …) # S3 method for array anyDuplicated(x, incomparables = FALSE, MARGIN = 1, fromLast = FALSE, …)
FALSEis a special value, meaning that all values can be compared, and may be the only value accepted for methods other than the default. It will be coerced internally to the same type as
duplicated = FALSE.
apply, and note that
MARGIN = 0maybe useful.
duplicated(): For a vector input, a logical vector of the same length as
x. For a data frame, a logical vector with one element for each row. For a matrix or array, and when
MARGIN = 0, a logical array with the same dimensions and dimnames.
anyDuplicated(): an integer or real vector of length one with value the 1-based index of the first duplicate if any, otherwise
vector) or differ only in their attributes. In the worst case it is \(O(n^2)\).
anyDuplicated(x, ...)is a “generalized” shortcut for
any(duplicated(x, ...)), in the sense that it returns the index
iof the first duplicated entry
x[i]if there is one, and
0otherwise. Their behaviours may be different when at least one of
anyDuplicatedhas a relevant method.
duplicated(x, fromLast = TRUE)is equivalent to but faster than
rev(duplicated(rev(x))). The data frame method works by pasting together a character representation of the rows separated by
\r, so may be imperfect if the data frame has characters with embedded carriage returns or columns which do not reliably map to characters. The array method calculates for each element of the sub-array specified by
MARGINif the remaining dimensions are identical to those for an earlier (or later, when
fromLast = TRUE) element (in row-major order). This would most commonly be used to find duplicated rows (the default) or columns (with
MARGIN = 2). Note that
MARGIN = 0returns an array of the same dimensionality attributes as
x. Missing values (
"NA") are regarded as equal, numeric and complex ones differing from
NaN; character strings will be compared in a “common encoding”; for details, see
unique) which use the same concept. Values in
incomparableswill never be marked as duplicated. This is intended to be used for a fairly small set of values and will not be efficient for a very large set. When used on a data frame with more than one column, or an array or matrix when comparing dimensions of length greater than one, this tests for identity of character representations. This will catch people who unwisely rely on exact equality of floating-point numbers! Except for factors, logical and raw vectors the default
nmax = NAis equivalent to
nmax = length(x). Since a hash table of size
8*nmaxbytes is allocated, setting
nmaxsuitably can save large amounts of memory. For factors it is automatically set to the smaller of
length(x)and the number of levels plus one (for
nmaxis set too small there is liable to be an error:
nmax = 1is silently ignored. Long vectors are supported for the default method of
duplicated, but may only be usable if
x <- c(9:20, 1:5, 3:7, 0:8) ## extract unique elements (xu <- x[!duplicated(x)]) ## similar, same elements but different order: (xu2 <- x[!duplicated(x, fromLast = TRUE)]) ## xu == unique(x) but unique(x) is more efficient stopifnot(identical(xu, unique(x)), identical(xu2, unique(x, fromLast = TRUE))) duplicated(iris)[140:143] duplicated(iris3, MARGIN = c(1, 3)) anyDuplicated(iris) ## 143 anyDuplicated(x) anyDuplicated(x, fromLast = TRUE)
Run the code above in your browser using DataCamp Workspace