Find duplicate rows
f_duplicates(
data,
...,
.keep_all = FALSE,
.both_ways = FALSE,
.add_count = FALSE,
.drop_empty = FALSE,
sort = FALSE,
.by = NULL,
.cols = NULL
)
A data.frame
of duplicate rows.
A data frame.
Variables used to find duplicate rows.
If TRUE
then all columns of data frame are kept,
default is FALSE
.
If TRUE
then duplicates and non-duplicate first instances
are retained. The default is FALSE
which returns only duplicate rows.
Setting this to TRUE
can be particularly useful when examining
the differences between duplicate rows.
If TRUE
then a count column is added to denote the
number of duplicates (including first non-duplicate instance).
The naming convention of this column follows dplyr::add_count()
.
If TRUE
then empty rows with all NA
values are removed.
The default is FALSE
.
Should result be sorted?
If FALSE
(the default), then rows are returned in the exact same order as
they appear in the data.
If TRUE
then the duplicate rows are sorted.
(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select.
(Optional) alternative to ...
that accepts
a named character vector or numeric vector.
If speed is an expensive resource, it is recommended to use this.
This function works like dplyr::distinct()
in its handling of
arguments and data-masking but returns duplicate rows.
In certain situations in can be much faster than data %>% group_by() %>% filter(n() > 1)
when there are many groups.
f_count f_distinct