fullmatch: Optimal full matching

Description

Given two groups, such as a treatment and a control group, and a treatment-by-control discrepancy matrix indicating desirability and permissibility of potential matches, create optimal full matches of members of the groups. Optionally, incorporate restrictions on matched sets' ratios of treatment to control units.

Usage

fullmatch(distance, min.controls = 0, max.controls = Inf, 
omit.fraction = NULL, tol = 0.001, subclass.indices = NULL)

Arguments

distance

A matrix of nonnegative discrepancies, each indicating the permissibility and desirability of matching the unit corresponding to its row (a 'treatment') to the unit corresponding to its column (a 'control'); or a list of such matrices. Finite discrepan

min.controls

The minimum ratio of controls to treatments that is to be permitted within a matched set: should be nonnegative and finite. If min.controls is not a whole number, the reciprocal of a whole number, or zero, then it is rounded down

max.controls

The maximum ratio of controls to treatments that is to be permitted within a matched set: should be positive and numeric. If max.controls is not a whole number, the reciprocal of a whole number, or Inf, then it is rounded <

omit.fraction

Optionally, specify what fraction of controls or treated subjects are to be rejected. If omit.fraction is a positive fraction less than one, then fullmatch leaves up to that fraction of the control reservoir unmatched.

tol

Because of internal rounding, fullmatch may solve a slightly different matching problem than the one specified, in which the match generated by fullmatch may not coincide with an optimal solution of the specified problem.

subclass.indices

An old argument included for back-compatibility; no longer needed.

Value

Primarily, a named vector of class c('optmatch', 'factor'). Elements of this vector correspond to members of the treatment and control groups in reference to which the matching problem was posed, and are named accordingly; the names are taken from the row and column names of distance. Each element of the vector is the concatenation of: (i) a character abbreviation of subclass.indices, if that argument was given, or the string 'm' if it was not; (ii) the string .; and (iii) a nonnegative integer or the string NA. In this last place, positive whole numbers indicate placement of the unit into a matched set, zero indicates a unit that was not matched, and NA indicates that all or part of the matching problem given to fullmatch was found to be infeasible.
In some cases, only proper subsets of the initial treatment and/or control groups will be represented in the value of fullmatch. Whether this occurs is determined by the status of argument subclass.indices. If subclass.indices is null, then all elements of the treatment and control groups, i.e. the rows and columns of distance, are represented in the value of fullmatch. Otherwise, the vector has an element for each unit represented in subclass.indices, i.e. for each element of factor subclass.indices or for each row of data frame subclass.indices.
Secondarily, fullmatch returns various data about the matching process and its result, stored as attributes of the named vector which is its primary output. In particular, the exceedances attribute gives upper bounds, not necessarily sharp, for the amount by which the sum of distances between matched units in the result of fullmatch exceeds the least possible sum of distances between matched units in a feasible solution to the matching problem given to fullmatch. Such a bound is also printed by print.optmatch.

Details

Consider using makedist to generate the distances, particularly on large problems. If distance is a list of matrices, each matrix is treated as a separate matching problem unto itself. In this case, one has to give names to the matrices in order to specify arguments min.controls or max.controls.

fullmatch tries to guess the order in which units would have been given in a data frame, and to order the factor that it returns accordingly. If the dimnames of distance, or the matrices it lists, are not simply row numbers of the data frame you're working with, then you should compare the names of fullmatch's output to your row names in order to be sure things are in the proper order. You can relieve yourself of these worries by using makedist to produce the distances, as it passes the ordering of units to fullmatch, which then uses it to order its outputs. The value of tol can have a substantial effect on computation time; with smaller values, computation takes longer. Not every tolerance can be met, and how small a tolerance is too small varies with the machine and with the details of the problem. If fullmatch can't guarantee that the tolerance is as small as the given value of argument tol, then matching proceeds but a warning is issued.

References

Hansen, B.B. (2004), Full Matching in an Observational Study of Coaching for the {SAT}, Journal of the American Statistical Association, 99, 609--618. (Cf. especially the Appendix.)

Rosenbaum, P. (1991), A Characterization of Optimal Designs for Observational Studies, Journal of the Royal Statistical Society, Series B, 53, 597--610.

Examples

Run this code

data(plantdist)
plantsfm <- fullmatch(plantdist) # A full match with unrestricted
                                 # treatment-control balance
pr <- logical(26)
pr[match(dimnames(plantdist)[[1]], names(plantsfm))] <- TRUE

table(plantsfm,                        # treatment-control balance, 
ifelse(pr,'treated', 'control'))       # by matched set

tapply(names(plantsfm),                # largest treatment-control 
plantsfm, FUN= function(x, dmat) {     # distances, by matched set
max(
    dmat[match(x, dimnames(dmat)[[1]]), 
         match(x, dimnames(dmat)[[2]])], 
    na.rm=TRUE )
}, dmat=plantdist)

plantsfm1 <- fullmatch(plantdist, # A full match with 
min.controls=2, max.controls=3)   # restrictions on matched sets' 
                                  # treatment-control balance


table(plantsfm1,                  # treatment-control balance is 
ifelse(pr,'treated','control'))   # improved by restrictions

tapply(names(plantsfm1),                # but distances between
plantsfm1, FUN= function(x, dmat) {     # matched units increase
max(                                    # slightly
    dmat[match(x, dimnames(dmat)[[1]]), 
         match(x, dimnames(dmat)[[2]])], 
    na.rm=TRUE )
}, dmat=plantdist)

Run the code above in your browser using DataLab