summ_order: Summarize list of pdqr-functions with order

Description

Functions for ordering the set of pdqr-functions supplied in a list. This might be useful for doing comparative statistical inference for several groups of data.

Usage

summ_order(f_list, method = "compare", decreasing = FALSE)
summ_sort(f_list, method = "compare", decreasing = FALSE)
summ_rank(f_list, method = "compare")

Arguments

f_list

List of pdqr-functions.

method

Method to be used for ordering. Should be one of "compare", "mean", "median", "mode".

decreasing

If TRUE ordering is done decreasingly.

Value

summ_order() works essentially like order(). It returns an integer vector representing a permutation which rearranges f_list in desired order.

summ_sort() returns a sorted (in desired order) variant of f_list.

summ_rank() returns a numeric vector representing ranks of f_list elements: 1 for the "smallest", length(f_list) for the "biggest".

Details

Ties for all methods are handled so as to preserve the original order.

Method "compare" is using the following ordering relation: pdqr-function f is greater than g if and only if P(f >= g) > 0.5, or in code summ_prob_true(f >= g) > 0.5 (see pdqr methods for "Ops" group generic family for more details on comparing pdqr-functions). This method orders input based on this relation and order() function. Notes:

This relation doesn't define strictly ordering because it is not transitive: there can be pdqr-functions f, g, and h, for which f is greater than g, g is greater than h, and h is greater than f (but should be otherwise). If not addressed, this might result into dependence of output on order of the input. It is solved by first preordering f_list based on method "mean" and then calling order().
Because comparing two pdqr-functions can be time consuming, this method becomes rather slow as number of f_list elements grows.

Methods "mean", "median", and "mode" are based on summ_center(): ordering of f_list is defined as ordering of corresponding measures of distribution's center.

Examples

Run this code

# NOT RUN {
d_fun <- as_d(dunif)
f_list <- list(a = d_fun, b = d_fun + 1, c = d_fun - 1)
summ_order(f_list)
summ_sort(f_list)
summ_rank(f_list)

# All methods might give different results on some elaborated pdqr-functions
# Methods "compare" and "mean" are not equivalent
non_mean_list <- list(
  new_d(data.frame(x = c(0.56, 0.815), y = c(1, 1)), "continuous"),
  new_d(data.frame(x = 0:1, y = c(0, 1)), "continuous")
)
summ_order(non_mean_list, method = "compare")
summ_order(non_mean_list, method = "mean")

# Methods powered by `summ_center()` are not equivalent
m <- c(0, 0.2, 0.1)
s <- c(1.1, 1.2, 1.3)
dlnorm_list <- lapply(seq_along(m), function(i) {
  as_d(dlnorm, meanlog = m[i], sdlog = s[i])
})
summ_order(dlnorm_list, method = "mean")
summ_order(dlnorm_list, method = "median")
summ_order(dlnorm_list, method = "mode")

# Method "compare" handles inherited non-transitivity. Here third element is
# "greater" than second (`P(f >= g) > 0.5`), second - than first, and first
# is "greater" than third.
non_trans_list <- list(
  new_d(data.frame(x = c(0.39, 0.44, 0.46), y = c(17, 14, 0)), "continuous"),
  new_d(data.frame(x = c(0.05, 0.3, 0.70), y = c(4, 0, 4)), "continuous"),
  new_d(data.frame(x = c(0.03, 0.40, 0.80), y = c(1, 1, 1)), "continuous")
)
summ_sort(non_trans_list)
  # Output doesn't depend on initial order
summ_sort(non_trans_list[c(2, 3, 1)])

# }

Run the code above in your browser using DataLab