Matchby: Grouped Multivariate and Propensity Score Matching

Description

This function is a wrapper for the Match function which separates the matching problem into subgroups defined by a factor. This is equivalent to conducting exact matching on each level of a factor. Matches within each level are found as determined by the usual matching options. This function is much faster for large datasets than the Match function itself. For additional speed, consider doing matching without replacement---see the replace option. This function is more limited than the Match function. For example, Matchby cannot be used if the user wishes to provide observation specific weights.

Usage

Matchby(Y, Tr, X, by, estimand = "ATT", M = 1, ties=FALSE, replace=TRUE,
        exact = NULL, caliper = NULL, AI=FALSE, Var.calc=0,
        Weight = 1, Weight.matrix = NULL, distance.tolerance = 1e-05,
        tolerance = sqrt(.Machine$double.eps), print.level=1, version="Matchby", ...)

Arguments

A vector containing the outcome of interest. Missing values are not allowed.

A vector indicating the observations which are in the treatment regime and those which are not. This can either be a logical vector or a real vector where 0 denotes control and 1 denotes treatment.

A matrix containing the variables we wish to match on. This matrix may contain the actual observed covariates or the propensity score or a combination of both.

A "factor" in the sense that as.factor(by) defines the grouping, or a list of such factors in which case their interaction is used for the grouping.

estimand

A character string for the estimand. The default estimand is "ATT", the sample average treatment effect for the treated. "ATE" is the sample average treatment effect (for all), and "ATC" is the sample average treatment effect for the controls

A scalar for the number of matches which should be found. The default is one-to-one matching. Also see the ties option.

ties

A logical flag for whether ties should be handled deterministically. By default ties==TRUE. If, for example, one treated observation matches more than one control observation, the matched dataset will include the multiple matched

replace

Whether matching should be done with replacement. Note that if FALSE, the order of matches generally matters. Matches will be found in the same order as the data is sorted. Thus, the match(es) for the first observation will be

exact

A logical scalar or vector for whether exact matching should be done. If a logical scalar is provided, that logical value is applied to all covariates of X. If a logical vector is provided, a logical value should be provided

caliper

A scalar or vector denoting the caliper(s) which should be used when matching. A caliper is the distance which is acceptable for any match. Observations which are outside of the caliper are dropped. If a scalar caliper is provided, this cali

A logical flag for if the Abadie-Imbens standard error should be calculated. It is computationally expensive to calculate with large datasets. Matchby can only calculate AI SEs for ATT. To calculate AI errors with other estimands,

Var.calc

A scalar for the variance estimate that should be used. By default Var.calc=0 which means that homoscedasticity is assumed. For values of Var.calc > 0, robust variances are calculated using Var.calc ma

Weight

A scalar for the type of weighting scheme the matching algorithm should use when weighting each of the covariates in X. The default value of 1 denotes that weights are equal to the inverse of the variances. 2 denotes the Maha

Weight.matrix

This matrix denotes the weights the matching algorithm uses when weighting each of the covariates in X---see the Weight option. This square matrix should have as many columns as the number of columns of the X

distance.tolerance

This is a scalar which is used to determine if distances between two observations are different from zero. Values less than distance.tolerance are deemed to be equal to zero. This option can be used to perform a type of optimal

tolerance

This is a scalar which is used to determine numerical tolerances. This option is used by numerical routines such as those used to determine if a matrix is singular.

print.level

The level of printing. Set to '0' to turn off printing.

version

The version of the code to be used. The "Matchby" C/C++ version of the code is the fastest, and the end-user should not change this option.

...

Additional arguments passed on to Match.

Value

estThe estimated average causal effect.
se.standardThe usual standard error. This is the standard error calculated on the matched data using the usual method of calculating the difference of means (between treated and control) weighted so that ties are taken into account.
seThe Abadie-Imbens standard error. This is only calculated if the AI option is TRUE. This standard error has correct coverage if X consists of either covariates or a known propensity score because it takes into account the uncertainty of the matching procedure. If an estimated propensity score is used, the uncertainty involved in its estimation is not accounted for although the uncertainty of the matching procedure itself still is.
index.treatedA vector containing the observation numbers from the original dataset for the treated observations in the matched dataset. This index in conjunction with index.control can be used to recover the matched dataset produced by Matchby. For example, the X matrix used by Matchby can be recovered by rbind(X[index.treated,],X[index.control,]).
index.controlA vector containing the observation numbers from the original data for the control observations in the matched data. This index in conjunction with index.treated can be used to recover the matched dataset produced by Matchby. For example, the Y matrix for the matched dataset can be recovered by c(Y[index.treated],Y[index.control]).
weightsThe weights for each observation in the matched dataset.
orig.nobsThe original number of observations in the dataset.
nobsThe number of observations in the matched dataset.
wnobsThe number of weighted observations in the matched dataset.
orig.treated.nobsThe original number of treated observations.
ndropsThe number of matches which were dropped because there were not enough observations in a given group and because of caliper and exact matching.
estimandThe estimand which was estimated.
versionThe version of Match which was used.

Details

Matchby is much faster for large datasets than Match. But Matchby only implements a subset of the functionality of Match. For example, the restrict option cannot be used, Abadie-Imbens standard errors are not provided and bias adjustment cannot be requested. Matchby is a wrapper for the Match function which separates the matching problem into subgroups defined by a factor. This is the equivalent to doing exact matching on each factor, and the way in which matches are found within each factor is determined by the usual matching options. Note that by default ties=FALSE although the default for the Match in GenMatch functions is TRUE. This is done because randomly breaking ties in large datasets often results in a great speedup. For additional speed, consider doing matching without replacement which is often much faster when the dataset is large---see the replace option. There will be slight differences in the matches produced by Matchby and Match because of how the covariates are weighted. When the data is broken up into separate groups (via the by option), Mahalanobis distance and inverse variance will imply different weights than when the data is taken as whole.

References

Sekhon, Jasjeet S. 2007. ``Multivariate and Propensity Score Matching Software with Automated Balance Optimization.'' Journal of Statistical Software. http://sekhon.berkeley.edu/papers/MatchingJSS.pdf

Sekhon, Jasjeet S. 2006. ``Alternative Balance Metrics for Bias Reduction in Matching Methods for Causal Inference.'' Working Paper. http://sekhon.berkeley.edu/papers/SekhonBalanceMetrics.pdf

Abadie, Alberto and Guido Imbens. 2006. ``Large Sample Properties of Matching Estimators for Average Treatment Effects.'' Econometrica 74(1): 235-267. http://ksghome.harvard.edu/~.aabadie.academic.ksg/sme.pdf

Diamond, Alexis and Jasjeet S. Sekhon. 2005. ``Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.'' Working Paper. http://sekhon.berkeley.edu/papers/GenMatch.pdf

Imbens, Guido. 2004. Matching Software for Matlab and Stata. http://elsa.berkeley.edu/~imbens/estimators.shtml

Examples

Run this code

#
# Match exactly by racial groups and then match using the propensity score within racial groups
#

data(lalonde)

#
# Estimate the Propensity Score
#
glm1  <- glm(treat~age + I(age^2) + educ + I(educ^2) + 
             hisp + married + nodegr + re74  + I(re74^2) + re75 + I(re75^2) +
             u74 + u75, family=binomial, data=lalonde)


#save data objects
#
X  <- glm1$fitted
Y  <- lalonde$re78
Tr <- lalonde$treat

# one-to-one matching with replacement (the "M=1" option) after exactly
# matching on race using the 'by' option.  Estimating the treatment
# effect on the treated (the "estimand" option defaults to ATT).
rr  <- Matchby(Y=Y, Tr=Tr, X=X, by=lalonde$black, M=1);
summary(rr)

# Let's check the covariate balance
# 'nboots' is set to small values in the interest of speed.
# Please increase to at least 500 each for publication quality p-values.  
mb  <- MatchBalance(treat~age + I(age^2) + educ + I(educ^2) + black +
                    hisp + married + nodegr + re74  + I(re74^2) + re75 + I(re75^2) +
                    u74 + u75, data=lalonde, match.out=rr, nboots=10)

Run the code above in your browser using DataLab