matchprop(form,data=NULL,distance="logit",discard="both", reestimate="FALSE",m.order="none",nclose=0,ytreat=1)
Unless distance = "mahal", the glm command is used to estimate the propensity scores using a series of discrete choice models for the probability, p, that the dependent variable equals ytreat rather than each alternative value of the dependent variable. The default link function is distance = "logit". Alternative link functions are specified using the distance option. Links include the standard ones for a glm model with family = binomial, e.g., "probit", "cauchit", "log", and "cloglog".
If mahal= T, matchprop implements MatchIt's version of mahalanobis matching. Letting X be the matrix of explanatory variables specified in form, the mahalanobis measure of distance from the vector of mean values is $p = mahalanobis(X, colMeans(X), cov(X)) $. Although this version of mahalanobis matching is fast, it may not be the best way to construct matches because it treats observations that are above and below the mean symmetrically. For example, if X is a single variable with $mean(X)$ = .5 and $var(X)$ = 1, mahalanobis matching treats X =.3 and X = .7 the same: mahalanobis(.3,.5,1) = .04 and mahalanobis(.7,.5,.1) = .04. The function matchmahal is slower but generally preferable for mahalanobis matching because it pairs each treatment observation with the closest control observation, i.e., $min(mahalanobis(X0, X1[i,], cov(X)))$, where X0 is the matrix of explanatory variables for the control observations, X1 is the matrix for the treatment observations, X is the pooled explanatory variable matrix, and i is the target treatment observation.
To illustrate how matchprop constructs matched samples, suppose that the dependent variable takes on three values,
y = 1, 2, 3, and assume that y = 1 is the treatment group. First, the y = 1 and y = 2 observations are pooled
and a propensity score p is constructed by, e.g., estimating a logit model for the probability that y = 1 rather than 2.
Unless m.order = "none", the data frame is then sorted by p -- from largest to smallest if m.order = "largest",
from smallest to largest if m.order = "smallest", and randomly if m.order = "random". The first treatment observation is then paired with
the closest control observation, the second treatment observation is paired with the closest of the remaining control observations, and so on
until the last observation is reached for one of the groups. No control observation is matched to more than one treatment observation,
and only pairwise matching is supported using matchprop. The process is then repeated using the y = 1 and y = 3 observations.
If the number of treatment observations is n1, the final data set will have roughly 3*n1 observations -- the n1 treatment observations and n1 observations
each from the y = 2 and y = 3 observations. The exact number of observations will differ depending on how observations are treated by the
discard option, and there will be fewer than n1 observations for, e.g., group 2 if n2
The discard option determines how observations are handled that are outside the probability support.
In the above example, let p be the propensity score for the logit model for the probability that y = 1 rather than 2.
If discard = "control", observations with p[y==2] If reestimate = T, the propensity scores are reestimated after any observations are discarded. Otherwise, matches are based on the original propensity scores.
Ho, D., Imai, K., King, G, Stuart, E., "Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference," Political Analysis 15 (2007), 199-236.
Ho, D., Imai, K., King, G, Stuart, E., "MatchIt: Nonparametric preprocessing for parametric causal inference," Journal of Statistical Software 42 (2011), 1-28..
McMillen, Daniel P., "Repeat Sales as a Matching Estimator," Real Estate Economics 40 (2012), 743-771.
matchmahal
set.seed(189)
n = 1000
x <- rnorm(n)
x <- sort(x)
y <- x*1 + rnorm(n, 0, sd(x)/2)
y <- ifelse(y>0,1,0)
table(y)
fit <- matchprop(y~x,m.order="largest",ytreat=1)
table(fit$y)
Run the code above in your browser using DataLab