group()
scans the rows of a data frame (or atomic vector / list of atomic vectors), assigning to each unique row an integer id - starting with 1 and proceeding in first-appearance order of the rows. The function is written in C and optimized for R's data structures. It is the workhorse behind functions like GRP
/ fgroup_by
, collap
, qF
, qG
, finteraction
and funique
, when called with argument sort = FALSE
.
group(x, starts = FALSE, group.sizes = FALSE)
an atomic vector or data frame / list of equal-length atomic vectors.
logical. If TRUE
, an additional attribute 'starts' is attached giving a vector of group starts (index of first-occurrence of unique rows).
logical. If TRUE
, an additional attribute 'group.sizes' is attached giving the size of each group.
An object is of class 'qG' see qG
.
A data frame is grouped on a column-by-column basis, starting from the leftmost column. For each new column the grouping vector obtained after the previous column is also fed back into the hash function so that unique values are determined on a running basis. The algorithm terminates as soon as the number of unique rows reaches the size of the data frame. Missing values are also grouped just like any other values. Invoking arguments starts
and/or group.sizes
requires an additional pass through the final grouping vector.
# NOT RUN {
# Let's replicate what funique does
g <- group(wlddev, starts = TRUE)
if(attr(g, "N.groups") == fnrow(wlddev)) wlddev else
ss(wlddev, attr(g, "starts"))
# }
Run the code above in your browser using DataLab