matchmahal(form,data=NULL,discard="none", distance="logit",m.order="none",nclose=0,ytreat=1)
Creates a matched sample data set by matching each treatment variable to the closest control variable based on mahalanobis distances. Like matchprop, matchmahal is particularly useful for creating a series of matched sample data sets over time relative to a base time period.
Let X1 be the matrix of explanatory variables for the treatment observations and let X2 be the comparable matrix for the control observations. The mahalanobis measure of distance between the ith row of X1 and all control observations is $d_i = mahalanobis(X2, X1[i,], cov(rbind(X2,X1)))$. The first observation of X1 is matched with the closest observation in X2 based on this distance measure. The row is then removed from X2 and the second observation of X1 is matched with the closest of the remaining control observations. The process is repeated until there are no more observations left in one of the matrices.
By default, matchprop matches every treatment observation with a control observation. If the number of treatment observations (n1) is less than the number of control observations (n2), then the first n2 treatment observations will be in the final matched sample data set. By default, the observations are matched in the order in which they appear in the original data set. Alternatively, the observations can be matched in random order by specifying m.order = "random".
The distance option allows the user to specify a metric by which observations are determined to be outside the probability support. The same options are available as in the matchprop command. The natural one is distance = "mahal" combined with discard = "control", "treat" or "both" and m.order = "increasing", "decreasing", or "random". Other options are listed in the documentation for the matchprop command, e.g., distance = "logit" or "probit". Any of the these distance options produces a propensity score, p. When distance = "mahal", the propensity score is the mahalanobis distance of each observation from the vector of means. The discard option determines how observations are handled that are outside the probability support. For example, if the treatment is set to ytreat = 1 and the alternative value of the dependent variable is y = 2, then:
discard = "control": observations with p[y==2] discard = "treat": observations with p[y==1]>max(p[y==2]) are discarded from the y==1 sample discard = "both": both sets of observations are deleted If discard = "treat" or "both" and the dependent variable has more than two values,
a different set of treatment observations may be discarded as being outside the support of the two propensity measures.
Only treatment observations that are rejected by both models will end up being omitted from the final data set.
Ho, D., Imai, K., King, G, Stuart, E., "Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference," Political Analysis 15 (2007), 199-236.
Ho, D., Imai, K., King, G, Stuart, E., "MatchIt: Nonparametric preprocessing for parametric causal inference," Journal of Statistical Software 42 (2011), 1-28..
McMillen, Daniel P., "Repeat Sales as a Matching Estimator," Real Estate Economics 40 (2012), 743-771.
matchprop
set.seed(189)
n = 1000
x <- rnorm(n)
x <- sort(x)
y <- x*1 + rnorm(n, 0, sd(x)/2)
y <- ifelse(y>0,1,0)
table(y)
fit <- matchmahal(y~x,ytreat=1)
table(fit$y)
Run the code above in your browser using DataLab