rdpmedian: Rank difference pseudomedian

Description

Computes the Hodges-Lehmann pseudomedian and bootstrap confidence interval for the differences of ranks.

Usage

rdpmedian(
  data,
  formula,
  conf_level = 0.95,
  conf_method = "percentile",
  n_resamples = 1000L,
  agg_fun = "error"
)
rdpmedian2(
  x,
  y,
  conf_level = 0.95,
  conf_method = "percentile",
  n_resamples = 1000L
)

Value

A list with the following elements:

Slot	Subslot	Name	Description
1		`pseudomedian`	Measure of centrality.
2		`lower`	Lower bound of confidence interval for the pseudomedian.
3		`upper`	Upper bound of confidence interval for the pseudomedian.
4		`method`	Estimate method.
5		`info`	Additional information.
5	1	`n_sample`	Number of observations in the original data.
5	2	`n_analytic`	Number of observations after removing missing values from the original data.
5	3	`data_type`	Data type.
5	4	`focal_name`	Name of the focal variable (differences are focal - reference).
5	5	`reference_name`	Name of the reference variable (differences are focal - reference).
6		`call`	A named list of the function's arguments (use `as.call()` to convert to a call).

Arguments

data

(data.frame)
The data frame of interest.

formula

(formula)
A formula of form:

y ~ group | block: Use when data is in tall format. y is the numeric outcome, group is the binary grouping variable, and block is the subject/item-level variable indicating pairs of observations. group will be converted to a factor and the first level will be the reference value. For example, when levels(data$group) <- c("pre", "post"), the focal level is 'post', so differences are post - pre. Pairs with missing values are silently dropped. See agg_fun for handling duplicate cases of grouping/blocking combinations.

y ~ x

Use when data is in wide format. y and x must be numeric vectors. Differences of ranks correspond with data$y - data$x. Pairs with missing values are silently dropped.

conf_level

(Scalar numeric: 0.95; [0, 1))
The confidence level. If conf_level = 0, no confidence interval is calculated.

conf_method

(Scalar character: c("percentile", "bca"))
The type of bootstrap confidence interval.

n_resamples

(Scalar integer: [10L, Inf)
The number of bootstrap resamples. If conf_level = 0, no resampling is performed.

agg_fun

(Scalar character or function: "error")
Used for aggregating duplicate cases of grouping/blocking combinations when data is in tall format and formula has structure y ~ group | block. "error" (default) will return an error if duplicate grouping/blocking combinations are encountered. Select one of "first", "last", "sum", "mean", "median", "min", or "max" for built in aggregation handling (each applies na.rm = TRUE). Or define your own function. For example, myfun <- function(x) {as.numeric(quantile(x, 0.75, na.rm = TRUE))}.

(numeric)
Numeric vector of data. Differences of ranks correspond with x - y. Pairs with missing values are silently dropped.

Details

This function generates a confidence interval for the pseudomedian based on the observed differences of ranks, not based on an inversion of the rank difference test rdt().

The Hodges-Lehmann estimator is the median of all pairwise averages of the sample values. $$\mathrm{HL} = \mathrm{median} \left\{ \frac{x_i + x_j}{2} \right\}_{i \le j}$$ This pseudomedian is a robust, distribution-free estimate of central tendency for a single sample, or a location-shift estimator for paired data. It's resistant to outliers and compatible with rank-based inference.

The percentile and BCa bootstrap confidence interval methods are described in chapter 5.3 of davison1997;textualrankdifferencetest.

This function is mainly a wrapper for the function Hmisc::pMedian().

References

davison1997rankdifferencetest

Hmiscrankdifferencetest

Examples

Run this code

#----------------------------------------------------------------------------
# rdpmedian() example
#----------------------------------------------------------------------------
library(rankdifferencetest)

# Use example data from Kornbrot (1990)
data <- kornbrot_table1

# Create tall-format data for demonstration purposes
data_tall <- reshape(
  data = kornbrot_table1,
  direction = "long",
  varying = c("placebo", "drug"),
  v.names = c("time"),
  idvar = "subject",
  times = c("placebo", "drug"),
  timevar = "treatment",
  new.row.names = seq_len(prod(length(c("placebo", "drug")), nrow(kornbrot_table1)))
)

# Subject and treatment should be factors. The ordering of the treatment factor
# will determine the difference (placebo - drug).
data_tall$subject <- factor(data_tall$subject)
data_tall$treatment <- factor(data_tall$treatment, levels = c("drug", "placebo"))

# Rate transformation inverts the rank ordering.
data$placebo_rate <- 60 / data$placebo
data$drug_rate <- 60 / data$drug
data_tall$rate <- 60 / data_tall$time

# Estimates
rdpmedian(
  data = data,
  formula = placebo ~ drug
)

rdpmedian(
  data = data_tall,
  formula = time ~ treatment | subject
)

rdpmedian2(
  x = data$placebo_rate,
  y = data$drug_rate
)

rdpmedian(
  data = data_tall,
  formula = rate ~ treatment | subject
)

Run the code above in your browser using DataLab