matchmahal: Matched sample data frame based on mahalanobis distances

Description

Creates a matched sample data frame based on mahalanobis distances

Usage

matchmahal(form,data=NULL,discard="none", distance="logit",m.order="none",nclose=0,ytreat=1)

Arguments

form

Model formula

data

A data frame containing the data. Default: use data in the current working directory

discard

Observations to be discarded based on the propensity score. If discard = "control", only control observations are discarded. If discard = "treat", only treatment observations are discarded. If discard = "both", both control and treatment observations are deleted. Default: discard = "none"; no options are discarded and propensity scores are not estimated.

distance

The link formula to be passed on to the glm command if discard = "control", "treat", or "both"; default = "logit"

m.order

Order by which estimated distances are sorted before starting the matching process. Options: "decreasing", "increasing", "random", and "none". As the "decreasing" and "increasing" options are based on propensity scores, they are only applicable when discard = "control", "treat", or "both".

nclose

If nclose>0, sorts the matched observations by the distance measure and chooses the nclose matches with the smallest distances.

ytreat

The value of the dependent variable for the treatment group. Default: ytreat = 1. Constructs matched samples for all other values of the dependent variable. If discard="treat" or discard="both", only treatment observations that were discarded for every control value of the dependent variable are omitted from the final data set.

Value

Returns the matched sample data frame. Adds the following variables to the data set:origobs: The observation number in the original data setmatchobs: The observation number in the matched data set to which the observation is matched. matchobs refers to the observation's number in the original data set, i.e., to the variable origobs.Note: If the original data set includes variables named origobs and matchobs, they will be overwritten by the variables produced by matchmahal.

Details

Creates a matched sample data set by matching each treatment variable to the closest control variable based on mahalanobis distances. Like matchprop, matchmahal is particularly useful for creating a series of matched sample data sets over time relative to a base time period.

Let X1 be the matrix of explanatory variables for the treatment observations and let X2 be the comparable matrix for the control observations. The mahalanobis measure of distance between the ith row of X1 and all control observations is $d_i = mahalanobis(X2, X1[i,], cov(rbind(X2,X1)))$. The first observation of X1 is matched with the closest observation in X2 based on this distance measure. The row is then removed from X2 and the second observation of X1 is matched with the closest of the remaining control observations. The process is repeated until there are no more observations left in one of the matrices.

By default, matchprop matches every treatment observation with a control observation. If the number of treatment observations (n1) is less than the number of control observations (n2), then the first n2 treatment observations will be in the final matched sample data set. By default, the observations are matched in the order in which they appear in the original data set. Alternatively, the observations can be matched in random order by specifying m.order = "random".

The distance option allows the user to specify a metric by which observations are determined to be outside the probability support. The same options are available as in the matchprop command. The natural one is distance = "mahal" combined with discard = "control", "treat" or "both" and m.order = "increasing", "decreasing", or "random". Other options are listed in the documentation for the matchprop command, e.g., distance = "logit" or "probit". Any of the these distance options produces a propensity score, p. When distance = "mahal", the propensity score is the mahalanobis distance of each observation from the vector of means. The discard option determines how observations are handled that are outside the probability support. For example, if the treatment is set to ytreat = 1 and the alternative value of the dependent variable is y = 2, then:

discard = "control": observations with p[y==2]

discard = "treat": observations with p[y==1]>max(p[y==2]) are discarded from the y==1 sample

discard = "both": both sets of observations are deleted

If discard = "treat" or "both" and the dependent variable has more than two values, a different set of treatment observations may be discarded as being outside the support of the two propensity measures. Only treatment observations that are rejected by both models will end up being omitted from the final data set.

References

Deng, Yongheng, Sing Tien Foo, and Daniel P. McMillen, "Private Residential Price Indices in Singapore," Regional Science and Urban Economics, 42 (2012), 485-494.

Ho, D., Imai, K., King, G, Stuart, E., "Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference," Political Analysis 15 (2007), 199-236.

Ho, D., Imai, K., King, G, Stuart, E., "MatchIt: Nonparametric preprocessing for parametric causal inference," Journal of Statistical Software 42 (2011), 1-28..

McMillen, Daniel P., "Repeat Sales as a Matching Estimator," Real Estate Economics 40 (2012), 743-771.

Examples

Run this code


set.seed(189)
n = 1000
x <- rnorm(n)
x <- sort(x)
y <- x*1 + rnorm(n, 0, sd(x)/2)
y <- ifelse(y>0,1,0)
table(y)
fit <- matchmahal(y~x,ytreat=1)
table(fit$y)

Run the code above in your browser using DataLab