# duplicated

##### Determine Duplicate Rows

`duplicated`

returns a logical vector indicating which rows of a
`data.table`

are duplicates of a row with smaller subscripts.

`unique`

returns a `data.table`

with duplicated rows removed, by
columns specified in `by`

argument. When no `by`

then duplicated
rows by all columns are removed.

`anyDuplicated`

returns the *index* `i`

of the first duplicated
entry if there is one, and 0 otherwise.

`uniqueN`

is equivalent to `length(unique(x))`

when x is an
`atomic vector`

, and `nrow(unique(x))`

when x is a `data.frame`

or `data.table`

. The number of unique rows are computed directly without
materialising the intermediate unique data.table and is therefore faster and
memory efficient.

- Keywords
- data

##### Usage

```
# S3 method for data.table
duplicated(x, incomparables=FALSE, fromLast=FALSE, by=seq_along(x), …)
```# S3 method for data.table
unique(x, incomparables=FALSE, fromLast=FALSE, by=seq_along(x), …)

# S3 method for data.table
anyDuplicated(x, incomparables=FALSE, fromLast=FALSE, by=seq_along(x), …)

uniqueN(x, by=if (is.list(x)) seq_along(x) else NULL, na.rm=FALSE)

##### Arguments

- x
A data.table.

`uniqueN`

accepts atomic vectors and data.frames as well.- …
Not used at this time.

- incomparables
Not used. Here for S3 method consistency.

- fromLast
logical indicating if duplication should be considered from the reverse side, i.e., the last (or rightmost) of identical elements would correspond to

`duplicated = FALSE`

.- by
`character`

or`integer`

vector indicating which combinations of columns from`x`

to use for uniqueness checks. By default all columns are being used. That was changed recently for consistency to data.frame methods. In version`< 1.9.8`

default was`key(x)`

.- na.rm
Logical (default is

`FALSE`

). Should missing values (including`NaN`

) be removed?

##### Details

Because data.tables are usually sorted by key, tests for duplication are
especially quick when only the keyed columns are considered. Unlike
`unique.data.frame`

, `paste`

is not used to ensure
equality of floating point data. It is instead accomplished directly and is
therefore quite fast. data.table provides `setNumericRounding`

to
handle cases where limitations in floating point representation is undesirable.

`v1.9.4`

introduces `anyDuplicated`

method for data.tables and is
similar to base in functionality. It also implements the logical argument
`fromLast`

for all three functions, with default value `FALSE`

.

##### Value

`duplicated`

returns a logical vector of length `nrow(x)`

indicating which rows are duplicates.

`unique`

returns a data table with duplicated rows removed.

`anyDuplicated`

returns a integer value with the index of first duplicate.
If none exists, 0L is returned.

`uniqueN`

returns the number of unique elements in the vector,
`data.frame`

or `data.table`

.

##### See Also

`setNumericRounding`

, `data.table`

,
`duplicated`

, `unique`

, `all.equal`

,
`fsetdiff`

, `funion`

, `fintersect`

,
`fsetequal`

##### Examples

```
# NOT RUN {
DT <- data.table(A = rep(1:3, each=4), B = rep(1:4, each=3),
C = rep(1:2, 6), key = "A,B")
duplicated(DT)
unique(DT)
duplicated(DT, by="B")
unique(DT, by="B")
duplicated(DT, by=c("A", "C"))
unique(DT, by=c("A", "C"))
DT = data.table(a=c(2L,1L,2L), b=c(1L,2L,1L)) # no key
unique(DT) # rows 1 and 2 (row 3 is a duplicate of row 1)
DT = data.table(a=c(3.142, 4.2, 4.2, 3.142, 1.223, 1.223), b=rep(1,6))
unique(DT) # rows 1,2 and 5
DT = data.table(a=tan(pi*(1/4 + 1:10)), b=rep(1,10)) # example from ?all.equal
length(unique(DT$a)) # 10 strictly unique floating point values
all.equal(DT$a,rep(1,10)) # TRUE, all within tolerance of 1.0
DT[,which.min(a)] # row 10, the strictly smallest floating point value
identical(unique(DT),DT[1]) # TRUE, stable within tolerance
identical(unique(DT),DT[10]) # FALSE
# fromLast=TRUE
DT <- data.table(A = rep(1:3, each=4), B = rep(1:4, each=3),
C = rep(1:2, 6), key = "A,B")
duplicated(DT, by="B", fromLast=TRUE)
unique(DT, by="B", fromLast=TRUE)
# anyDuplicated
anyDuplicated(DT, by=c("A", "B")) # 3L
any(duplicated(DT, by=c("A", "B"))) # TRUE
# uniqueN, unique rows on key columns
uniqueN(DT, by = key(DT))
# uniqueN, unique rows on all columns
uniqueN(DT)
# uniqueN while grouped by "A"
DT[, .(uN=uniqueN(.SD)), by=A]
# uniqueN's na.rm=TRUE
x = sample(c(NA, NaN, runif(3)), 10, TRUE)
uniqueN(x, na.rm = FALSE) # 5, default
uniqueN(x, na.rm=TRUE) # 3
# }
```

*Documentation reproduced from package data.table, version 1.13.2, License: MPL-2.0 | file LICENSE*