The matchthem()
function enables parametric models for causal inference to work better by selecting matched subsets of the control and treatment groups of imputed datasets of a mids
or amelia
class object.
matchthem(formula, datasets, approach = "within", method = "nearest",
distance = "logit", distance.options = list(), discard = "none",
reestimate = FALSE, ...)
This argument takes the usual syntax of R formula, z ~ x1 + x2
, where z
is a binary treatment indicator and x1
and x2
are the potential confounders. Both the treatment indicator and the potential confounders must be contained in the imputed datasets, which is specified as datasets
(see below). All of the usual R syntax for formula works. For example, x1:x2
represents the first order interaction term between x1
and x2
and I(x1^2)
represents the square term of x1
. See help(formula)
for details.
This argument specifies the datasets containing the treatment indicator and the potential confounders called in the formula
. This argument must be an object of the mids
or amelia
class, which is typically produced by a previous call to mice()
or mice.mids()
functions from the mice package or to amelia
function from the Amelia package (the Amelia package is designed to impute missing data in a single cross-sectional dataset or in a time-series dataset, although it may work with the latter, currently, the MatchThem package only supports the former datasets).
This argument specifies a matching approach. Currently, "within"
(calculating distance measures within each imputed dataset and matching observations based on them ) and "across"
(calculating distance measures within each imputed dataset, averaging distance measure for each observation across imputed datasets, and matching based on the averaged measures) approaches are available. The default is "within"
which has been shown previously to produce unbiased results.
This argument specifies a matching method. Currently, "nearest"
(nearest neighbor matching) and "exact"
(exact matching) methods are available. The default is "nearest"
. Note that within each of these matching methods, MatchThem offers a variety of options.
This argument specifies the method used to estimate the distance measure. The default is logistic regression, "logit"
. A variety of other methods are available.
This optional argument specifies the arguments that are passed to the model for estimating the distance measure. The input to this argument should be a list.
This argument specifies whether to discard observations that fall outside some measure of support of the distance score before matching and not allow them to be used at all in the matching procedure. Note that discarding observations may change the quantity of interest being estimated. The current options are "none"
(discarding no observations before matching), "both"
(discarding all observations, both the control and treatment observations, that are outside the support of the distance measure), "control"
(discarding only control observations outside the support of the distance measure of the treatment observations), and "treat"
(discarding only treatment observations outside the support of the distance measure of the control observations). The default is "none"
.
This argument specifies whether the model for estimating the distance measure should be reestimated after observations are discarded. The input must be a logical value. The default is FALSE
.
Additional arguments to be passed to the matching method.
This function returns an object of the mimids
(matched multiply imputed datasets) class, that includes matched subsets of the imputed datasets primarily passed to the function by the datasets
argument.
The matching is done using the matchthem(z ~ x1, ...)
command, where z
is the treatment indicator and x1
represents the potential cofoudenr to be used in the matching model. There are a number of matching options. The default syntax is matchthem(formula, datasets = NULL, method = "nearest", model = "logit", ratio = 1, caliper = 0, ...)
. Summaries of the results can be seen graphically using plot()
or numerically using summary()
functions. The print()
function also prints out the output.
Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, 15(3): 199-236. http://gking.harvard.edu/files/abs/matchp-abs.shtml
Stef van Buuren and Karin Groothuis-Oudshoorn (2011). mice
: Multivariate Imputation by Chained Equations in R
. Journal of Statistical Software, 45(3): 1-67. https://www.jstatsoft.org/v45/i03/
Gary King, James Honaker, Anne Joseph, and Kenneth Scheve (2001). Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation. American Political Science Review, 95: 49<U+2013>69. http://j.mp/2oOrtGs
# NOT RUN {
#Loading the dataset
data(osteoarthritis)
#Multiply imputing the missing values
imputed.datasets <- mice(osteoarthritis, m = 5, maxit = 10,
method = c("", "", "mean", "polyreg", "logreg", "logreg", "logreg"))
#Matching the multiply imputed datasets
matched.datasets <- matchthem(OSP ~ AGE + SEX + BMI + RAC + SMK, imputed.datasets,
approach = 'within', method = 'nearest')
# }
Run the code above in your browser using DataLab