rank_stratified() computes a single, combined rank for each row of a
data frame using stratified hierarchical ranking.
The first variable is ranked globally; each subsequent variable is then
ranked within strata defined by all previous variables.
rank_stratified(
data,
cols = NULL,
sort_by = "frequency",
desc = FALSE,
ties.method = "average",
na.last = TRUE,
freq_tiebreak = "match_desc",
verbose = TRUE
)A numeric vector of length nrow(data), containing stratified ranks.
Smaller values indicate "earlier" rows in the stratified hierarchy.
A data frame. Each selected column
represents one level of the stratified hierarchy, in the order given by
cols.
Optional column specification indicating which variables in data
to use for ranking, and in what order. Can be:
NULL (default): use all columns of data in their existing order.
A character vector of column names.
An integer vector of column positions.
Character scalar or vector specifying how to rank each
non-numeric column. Each element must be either "alphabetical" or
"frequency", matching the behaviour of smartrank(). If a single
value is supplied it is recycled for all columns. For numeric columns,
sort_by is ignored and ranking is always based on numeric order.
Logical scalar or vector indicating whether to rank each column in descending order. If a single value is supplied it is recycled for all columns.
Passed to base::rank() when resolving ties at each
level; must be one of "average", "first", "last", "random",
"max", or "min". See base::rank() for details.
Logical, controlling the treatment of missing values,
as in base::rank(). If TRUE, NAs are given the largest ranks; if
FALSE, the smallest. Unlike base::rank() or smartrank(), na.last
cannot be set to NA in rank_stratified(), because dropping rows would
change group membership and break stratified ranking.
Character scalar or vector controlling how
alphabetical tie-breaking works when sort_by = "frequency" and the
column is character/factor/logical. Each element must be one of:
"match_desc" (default): alphabetical tie-breaking follows
desc for that column (ascending when desc = FALSE, descending
when desc = TRUE).
"asc": ties are always broken by ascending alphabetical order.
"desc": ties are always broken by descending alphabetical order.
If a single value is supplied, it is recycled for all columns.
Logical; if TRUE, emit messages when sort_by is ignored
(e.g. for numeric columns), mirroring the behaviour of smartrank().
This is useful when you want a "truly hierarchical" ordering where,
for example, rows are first grouped and ordered by the frequency of
gender, and then within each gender group, ordered by the frequency
of pet within that gender, rather than globally.
The result is a single rank vector that can be passed directly to
base::order() to obtain a stratified, multi-level
ordering.
Stratified ranking proceeds level by level:
The first selected column is ranked globally, using sort_by[1]
(for non-numeric) and desc[1].
For the second column, ranks are computed separately within each
distinct combination of values of all previous columns. Within each
stratum, the second column is ranked using sort_by[2] / desc[2].
This process continues for each subsequent column: at level k, ranking is done within strata defined by columns 1, 2, ..., k-1.
This yields a single composite rank per row that reflects a "true" hierarchical (i.e. stratified) ordering: earlier variables define strata, and later variables are only compared within those strata (for example, by within-stratum frequency).
library(rank)
data <- data.frame(
gender = c("male", "male", "male", "male", "female", "female", "male", "female"),
pet = c("cat", "cat", "magpie", "magpie", "giraffe", "cat", "giraffe", "cat")
)
# Stratified ranking: first by gender frequency, then within each gender
# by pet frequency *within that gender*
r <- rank_stratified(
data,
cols = c("gender", "pet"),
sort_by = c("frequency", "frequency"),
desc = TRUE
)
data[order(r), ]
Run the code above in your browser using DataLab