Learn R Programming

eummd (version 0.2.0)

mediandiff: Compute the median difference between pairs of values

Description

Compute the median of all differences between distinct pairs in vectors or matrices.

Usage

mediandiff(X, Y = NULL, kernel = c("Laplacian", "Gaussian"), fast = FALSE)

Value

A scalar, the median of all pairwise differences.

Arguments

X

Numeric vector or matrix of length n.

Y

Numeric vector or matrix of length m, or NULL.

kernel

String, either "Laplacian" or "Gaussian".

fast

Boolean; if TRUE will run \(O(N \log N)\) algorithm, where N = n + m, but if FALSE (default) will run naive \(O(N^2 \log N)\) algorithm.

Details

The median difference is defined as follows:

\(Z\) is the combined \(X\) and \(Y\) values into a single vector or matrix. Number of columns is the dimension, and these need to be equal for \(X\) and \(Y\). Then, if the Laplacian kernel is used,

$$ m = \textnormal{median} \{ || x_i - x_j ||_1; \,\, i>j, \,\, i=1, 2,\dots, n+m,\,\,\textnormal{ and } j=1, 2,\dots, i-1 \}, $$

where \( || z_i - z_j ||_1\) is the 1-norm, and so if the data are d-dimensional then

$$ || z_i - z_j ||_1 = \sum_{k=1}^{d} |z_{i,k} - z_{j,k}|. $$

If the Gaussian kernel is specified, then the square of the two-norm is used.

The median heuristic is defined as beta = 1/m.

Naive method will compute all distinct pairs, of which there are \(N(N+1)/2\) differences. These are then sorted using a \(O(N \log N)\) algorithm, so overall \(O(N^2 \log N)\).

The fast method is \(O(N \log N)\) is from Croux and Rousseeuw (1992), which is based on Johnson and Mizoguchi (1978).

References

Croux, C. and Rousseeuw, P. J. (1992), "Time-Efficient Algorithms for Two Highly Robust Estimators of Scale" In Computational Statistics: Volume 1: Proceedings of the 10th Symposium on Computational Statistics (pp. 411-428).

Johnson, D.B., and Mizoguchi, T. (1978), "Selecting the Kth Element in X + Y and X_1 + X_2 + ... + X_m", SIAM Journal of Computing, 7, 147-153.

See Also

medianheuristic

Examples

Run this code

X <- c(7.1, 1.2, 4.3, 0.4)
Y <- c(5.5, 2.6, 8.7)
#using fast method, Laplacian kernel, loglinear in number of observations
md <- mediandiff(X, Y, fast=TRUE)

#using fast method, Gaussian kernel, loglinear in number of observations
md <- mediandiff(X, Y, fast=TRUE, kernel="Gaussian")

#using naive method (default), with Laplacian kernel
md <- mediandiff(X, Y)

Run the code above in your browser using DataLab