rRegMatch: Regular matching with minimum-cost spanning subgraphs

Description

This function matches each observation in X to r others so as to minimize the total distance across all matches. Optionally it computes the cross-count statistic -- the number of matches associated with two observations from different classes.

Usage

rRegMatch(X, r, y = NULL, dister = "daisy", dist.args = list(), keep.X = nrow(X) < 100,  keep.D = (dister == "treeClust.dist"), relax = (N >= 100), thresh = 1e-6)

Arguments

Matrix or data frame of data, or inter-point distances represented in an object inheriting from "dist"

Integer number of matches. The matching is "regular" in that every observation is matched to exactly r others (or, if relax=TRUE, every observation is matched to others with weights in [0, 1] that add up to r).

Vector of class membership indices. This is used to compute the cross-count statistic. Optional.

dister

Function to compute inter-point distances. This must take as its first argument a matrix of data argument name x. Default: daisy. If all the columns are numeric, this produces unweighted Euclidean distance by default.

dist.args

List of argument to the dister function.

keep.X

If TRUE, and X was supplied, keep the X matrix in the output object. Default: TRUE if X was supplied and also nrow (X) < 100.

keep.D

If TRUE, keep the distance object in the output. Default: TRUE if the treeClust.dist function is being used to compute the distances (since in that case the distances are random).

relax

If FALSE, solve the exact problem where each observation gets exactly r non-zero pairings, each with weight 1. If TRUE, solve the relaxed problem, where each observation has at least r non-zero pairings, each with its own weight between 0 and 1, the weights adding up to r. The exact problem gets very slow with large samples.

thresh

Weights smaller than this are considered to be exactly zero. Default: 1e-6.

Value

A list of class AcrossTic, with elements: A list of class AcrossTic, with elements:

Details

This function solves an optimization problem to extract the set of pairings which make the total weight (distance) associated with all pairings a minimum, subject to the constraint that every observation is paired to r others (or to enough others to have a total pair-weight of r).

References

David Ruth, "A new multivariate two-sample test using regular minimum-weight spanning subgraphs," J. Stat. Distributions and Applications (2014)

Examples

Run this code

set.seed (123)
X <- matrix (rnorm (100), 50, 2) # Create data...
y <- rep (c (1, 2), each=25) # ...and class membership
rRegMatch (X, r = 3, y = y)
## Not run: plot (rRegMatch (X, r = 3, y = y)) # to see picture

Run the code above in your browser using DataLab