# duplicated

##### Determine Duplicate Elements

`duplicated()`

determines which elements of a vector or data frame are duplicates
of elements with smaller subscripts, and returns a logical vector
indicating which elements (rows) are duplicates.

`anyDuplicated(.)`

is a “generalized” more efficient
shortcut for `any(duplicated(.))`

.

##### Usage

```
duplicated(x, incomparables = FALSE, ...)
"duplicated"(x, incomparables = FALSE, fromLast = FALSE, nmax = NA, ...)
"duplicated"(x, incomparables = FALSE, MARGIN = 1, fromLast = FALSE, ...)
anyDuplicated(x, incomparables = FALSE, ...)
"anyDuplicated"(x, incomparables = FALSE, fromLast = FALSE, ...)
"anyDuplicated"(x, incomparables = FALSE, MARGIN = 1, fromLast = FALSE, ...)
```

##### Arguments

- x
- a vector or a data frame or an array or
`NULL`

. - incomparables
- a vector of values that cannot be compared.
`FALSE`

is a special value, meaning that all values can be compared, and may be the only value accepted for methods other than the default. It will be coerced internally to the same type as`x`

. - fromLast
- logical indicating if duplication should be considered
from the reverse side, i.e., the last (or rightmost) of identical
elements would correspond to
`duplicated = FALSE`

. - nmax
- the maximum number of unique items expected (greater than one).
- ...
- arguments for particular methods.
- MARGIN
- the array margin to be held fixed: see
`apply`

, and note that`MARGIN = 0`

maybe useful.

##### Details

These are generic functions with methods for vectors (including lists), data frames and arrays (including matrices).

For the default methods, and whenever there are equivalent method
definitions for `duplicated`

and `anyDuplicated`

,
`anyDuplicated(x, ...)`

is a “generalized” shortcut for
`any(duplicated(x, ...))`

, in the sense that it returns the
*index* `i`

of the first duplicated entry `x[i]`

if
there is one, and `0`

otherwise. Their behaviours may be
different when at least one of `duplicated`

and
`anyDuplicated`

has a relevant method.

`duplicated(x, fromLast = TRUE)`

is equivalent to but faster than
`rev(duplicated(rev(x)))`

.

The data frame method works by pasting together a character
representation of the rows separated by `\r`

, so may be imperfect
if the data frame has characters with embedded carriage returns or
columns which do not reliably map to characters.

The array method calculates for each element of the sub-array
specified by `MARGIN`

if the remaining dimensions are identical
to those for an earlier (or later, when `fromLast = TRUE`

) element
(in row-major order). This would most commonly be used to find
duplicated rows (the default) or columns (with `MARGIN = 2`

).
Note that `MARGIN = 0`

returns an array of the same
dimensionality attributes as `x`

.

Missing values are regarded as equal, but `NaN`

is not equal to
`NA_real_`

.

Values in `incomparables`

will never be marked as duplicated.
This is intended to be used for a fairly small set of values and will
not be efficient for a very large set.

When used on a data frame with more than one column, or an array or matrix when comparing dimensions of length greater than one, this tests for identity of character representations. This will catch people who unwisely rely on exact equality of floating-point numbers!

Character strings will be compared as byte sequences if any input is
marked as `"bytes"`

(see `Encoding`

).

Except for factors, logical and raw vectors the default `nmax = NA`

is
equivalent to `nmax = length(x)`

. Since a hash table of size
`8*nmax`

bytes is allocated, setting `nmax`

suitably can
save large amounts of memory. For factors it is automatically set to
the smaller of `length(x)`

and the number of levels plus one (for
`NA`

). If `nmax`

is set too small there is liable to be an
error: `nmax = 1`

is silently ignored.

Long vectors are supported for the default method of
`duplicated`

, but may only be usable if `nmax`

is supplied.

##### Value

`duplicated()`

:
For a vector input, a logical vector of the same length as
`x`

. For a data frame, a logical vector with one element for
each row. For a matrix or array, and when `MARGIN = 0`

, a
logical array with the same dimensions and dimnames.`anyDuplicated()`

: an integer or real vector of length one with
value the 1-based index of the first duplicate if any, otherwise
`0`

.
##### Warning

Using this for lists is potentially slow, especially if the elements
are not atomic vectors (see `vector`

) or differ only
in their attributes. In the worst case it is $O(n^2)$.

##### References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
*The New S Language*.
Wadsworth & Brooks/Cole.

##### See Also

##### Examples

`library(base)`

```
x <- c(9:20, 1:5, 3:7, 0:8)
## extract unique elements
(xu <- x[!duplicated(x)])
## similar, same elements but different order:
(xu2 <- x[!duplicated(x, fromLast = TRUE)])
## xu == unique(x) but unique(x) is more efficient
stopifnot(identical(xu, unique(x)),
identical(xu2, unique(x, fromLast = TRUE)))
duplicated(iris)[140:143]
duplicated(iris3, MARGIN = c(1, 3))
anyDuplicated(iris) ## 143
anyDuplicated(x)
anyDuplicated(x, fromLast = TRUE)
```

*Documentation reproduced from package base, version 3.2.4, License: Part of R 3.2.4*

### Community examples

**fabionatalini@gmail.com**at Jan 12, 2018 base v3.4.3

#take some data data <- mtcars[, c(1:3)] row.names(data) <- NULL #add three new rows equal to the first three rows of data new.rows <- data.frame(mpg=c(21.0, 21.0, 22.8), cyl=c(6, 6, 4), disp=c(160, 160, 93)) new.data <- rbind(data, new.rows) #check for duplicated rows duplicated(new.data) #select the duplicated rows new.data[duplicated(new.data),] #remove the duplicated rows new.data[!duplicated(new.data),]