These are tidy-based functions for calculating group IDs and row IDs.
group_id()
returns an integer vector of group IDs
the same size as the x
.
row_id()
returns an integer vector of row IDs.
f_consecutive_id()
returns an integer vector of consecutive run IDs.
The add_
variants add a column of group IDs/row IDs.
group_id(x, order = TRUE, ascending = TRUE, as_qg = FALSE)row_id(x, ascending = TRUE)
f_consecutive_id(x)
An integer vector.
A vector or data frame.
Should the groups be ordered?
When order is TRUE
(the default) the group IDs will be
ordered but not sorted.
If FALSE
the order of the group IDs will be based on first appearance.
Should the order be ascending or descending?
The default is TRUE
.
For row_id()
this determines if the row IDs are in
increasing or decreasing order.
Should the group IDs be returned as a
collapse "qG" class? The default (FALSE
) always returns
an integer vector.
Note - When working with data frames it is highly recommended
to use the add_
variants of these functions. Not only are they more
intuitive to use, they also have optimisations for large numbers of groups.
group_id
This assigns an integer value to unique elements of a vector or unique rows of a data frame. It is an extremely useful function for analysis as you can compress a lot of information into a single column, using that for further operations.
row_id
This assigns a row number to each group. To assign plain row numbers
to a data frame one can use add_row_id()
.
This function can be used in rolling calculations, finding duplicates and
more.
consecutive_id
An alternative to dplyr::consecutive_id()
, f_consecutive_id()
also
creates an integer vector with values in the range [1, n]
where
n
is the length of the vector or number of rows of the data frame.
The ID increments every time x[i] != x[i - 1]
thus giving information on
when there is a change in value.
f_consecutive_id
has a very small overhead in terms
of calling the function, making it suitable for repeated calls.
add_group_id add_row_id add_consecutive_id