Learn R Programming

Matching (version 0.96)

GenMatch: Genetic Matching

Description

This function finds optimal balance using multivariate matching where a genetic search algorithm determines the weight each covariate is given. This function finds the optimal weight each variable should be given by Match so as to achieve balance. Balance is determined by a variety of univariate test, mainly paired t-tests for dichotomous variables and an adjusted univariate Kolmogorov-Smirnov (KS) test for multinomial and continuous variables. The object returned by this function can be supplied to the Weight.matrix option of the Match function to obtain estimates.

Usage

GenMatch(Tr, X, BalanceMatrix=X, estimand="ATT", M=1,
         weights=rep(1,length(Tr)),
         pop.size = 50, max.generations=100,
         wait.generations=4, hard.generation.limit=FALSE,
         starting.values=rep(1,ncol(X)),
         data.type.integer=TRUE,
         MemoryMatrix=TRUE,
         exact=NULL, caliper=NULL, 
         nboots=0, ks=TRUE, verbose=FALSE,
         tolerance = 1e-05,
         distance.tolerance=tolerance,
         min.weight=0, max.weight=1000,
         Domains=NULL, print.level=2,
         project.path=NULL,
         paired=TRUE, ...)

Arguments

Tr
A vector indicating the observations which are in the treatment regime and those which are not. This can either be a logical vector or a real vector where 0 denotes control and 1 denotes treatment.
X
A matrix containing the variables we wish to match on. This matrix may contain the actual observed covariates or the propensity score or a combination of both.
BalanceMatrix
A matrix containing the variables we wish achieve balance on. This is by default equal to X, but it can in principle be a matrix which contains more or less variables than X or variables which are transformed in vari
estimand
A character string for the estimand. The default estimand is "ATT", the sample average treatment effect for the treated. "ATE" is the sample average treatment effect (for all), and "ATC" is the sample average treatment effect for the controls
M
A scalar for the number of matches which should be found (with replacement). The default is one-to-one matching.
weights
A vector the same length as Y which provides observations specific weights.
pop.size
Population Size. This is the number of individuals genoud uses to solve the optimization problem. See genoud for more details.
max.generations
Maximum Generations. This is the maximum number of generations that genoud will run when attempting to optimize a function. This is a soft limit. The maximum generation limit w
wait.generations
If there is no improvement in the objective function in this number of generations, genoud will think that it has found the optimum. The other variables controlling termination are
hard.generation.limit
This logical variable determines if the max.generations variable is a binding constraint for genoud. If hard.generation.limit is FALSE, then
starting.values
This vector equal to the number of variables in X. This vector contains the starting weights each of the variables is given. The starting.values vector is a way for the user to insert one individual into the
data.type.integer
By default only integer weights are considered. If this option is set to false, search will be done over floating point weights. This is usually an unnecessary degree of precision.
MemoryMatrix
This variable controls if genoud sets up a memory matrix. Such a matrix ensures that genoud will request the fitness evaluation of a giv
exact
A logical scalar or vector for whether exact matching should be done. If a logical scalar is provided, that logical value is applied to all covariates of X. If a logical vector is provided, a logical value should be provided
caliper
A scalar or vector denoting the caliper(s) which should be used when matching. A caliper is the distance which is acceptable for any match. Observations which are outside of the caliper are dropped. If a scalar caliper is provided, this cali
nboots
The number of bootstrap samples to be run for the ks test.
ks
A logical flag for if the univariate bootstrap Kolmogorov-Smirnov (KS) test should be calculated. If the ks option is set to true, the univariate KS test is calculated for all non-dichotomous variables. The bootstrap KS test is consistent ev
verbose
If details should be printed for each fit evaluation done by the genetic algorithm.
tolerance
This is a scalar which is used to determine numerical tolerances. This option is used by numerical routines such as those used to determine if matrix is singular.
distance.tolerance
This is a scalar which is used to determine if distances between two observations are different from zero. Values less than distance.tolerance are deemed to be equal to zero. This option can be used to perform a type of optimal
min.weight
This is the minimum weight any variable may be given.
max.weight
This is the maximum weight any variable may be given.
Domains
This is a ncol(X) $\times 2$ matrix. The first column is the lower bound, and the second column is the upper bound for each variable over which genoud will search for weights.
print.level
This option controls the level of printing. There are four possible levels: 0 (minimal printing), 1 (normal), 2 (detailed), and 3 (debug). If level 2 is selected, GenMatch will print details about the population at each generati
project.path
This is the path of the genoud project file. By default no file is produced unless print.level=3. In that case, genoud
paired
A flag for if the paired t.test should be used when determining balance.
...
Other options which are passed on to genoud.

Value

  • valueThe lowest p-value of the matched dataset.
  • parA vector of the weights given to each variable in X.
  • Weight.matrixA matrix whose diagonal corresponds to the weight given to each variable in X. This object corresponds to the Weight.matrix in the Match function.
  • matchesA matrix with three columns. The first column contains the row numbers of the treated observations in the matched dataset. This column corresponds to the index.treated object which is returned by Match. The second column gives the row numbers of the control observations. This column corresponds to the index.control object which is returned by Match. And the last column gives the weight that each matched pair is given. This column corresponds to the weights object which is returned by Match
  • ecaliperThe size of the enforced caliper on the scale of the X variables. This object has the same length as the number of covariates in X.

Details

This function maximizes the smallest p-value that is observed in any of the univariate tests of balance. During optimization, the smallest observed p-value is printed.

References

Diamond, Alexis and Jasjeet S. Sekhon. 2005. ``Genetic Matching for Estimating Causal Effects: A New Method of Achieving Balance in Observational Studies.'' Working Paper. http://jsekhon.fas.harvard.edu/papers/GenMatch.pdf

See Also

Also see Match, summary.Match, MatchBalance, genoud, balanceMV, balanceUV, ks.boot, GerberGreenImai, lalonde

Examples

Run this code
set.seed(38913)

data(lalonde)
attach(lalonde)

#The covariates we want to match on
X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74);

#The covariates we want to obtain balance on
BalanceMat <- cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74,
                    I(re74*re75));

#Let's call GenMatch() to find the optimal weight to give each
#covariate in 'X' so as we have achieved balance on the covariates in
#'BalanceMat'. This is only an example so we want GenMatch to be quick
#to the population size has been set to be only 15 via the 'pop.size'
#option.  
genout <- GenMatch(Tr=treat, X=X, BalanceMatrix=BalanceMat, estimand="ATE", M=1,
                   pop.size=16, max.generations=10, wait.generations=1)

#The outcome variable
Y=re78/1000;

# Now that GenMatch() has found the optimal weights, let's estimate
# our causal effect of interest using those weights
mout <- Match(Y=Y, Tr=treat, X=X, estimand="ATE", Weight.matrix=genout)
summary(mout)

#                        
#Let's determine if balance has actually been obtained on the variables of interest
#                        
mb <- MatchBalance(treat~age +educ+black+ hisp+ married+ nodegr+ u74+ u75+
                   re75+ re74+ I(re74*re75),
                   match.out=mout, nboots=500, ks=TRUE, mv=FALSE)

Run the code above in your browser using DataLab