MatchThem
Matching Multiply Imputed Datasets
Introduction
One of the major issues in the propensity score matching procedures is the presence of missing data on the treatment indicator or covariates since matching relies on the predictions from a logistic regression model and with missing information in the variables within the model, the predictions cannot be made for that observation. There are a couple of solutions to address this problem (including the complete-case analysis and the missingness pattern approach) and despite these standard approaches, adopting algorithms to multiply impute the missing data is growing as a popular alternative.
Matching of control and treatment observations based on the propensity score in multiply imputed datasets can be achieved through different approaches:
- The within (match-then-pool) approach: In this approach, matching is done on each imputed dataset, using the observed and imputed covariate values, and the treatment effect sizes obtained from analyzing the matched datasets are pooled together (please see the article by Leyrat et al. for more details).
- The across (pool-then-match) approach: In this approach, the calculated propensity scores for each observation across the imputed datasets are pooled and using this pooled measure, matching is done on the imputed datasets. The matched datasets are analyzed and the treatment effect sizes obtained from these analyses are pooled together (this method is shown to produce inconsistent estimates of the treatment effect size, please see the article by Mitra et al. for more details).
The mice
and Amelia
packages are widely accepted statistical tools for imputing the ignorable missing data in the R platform. The MatchThem
package simplifies the process of matching the imputed datasets obtained using these packages and enables credible adoption of the two matching approaches and several matching methods in practice. This package enables parametric models for causal inference to provide unbiased estimates through selecting matched observations from the control and treatment groups, analyzing the matched datasets, and pooling the obtained results on imputed datasets using Rubin’s rules.
Installation
The MatchThem
package can be installed from the Comprehensive R Archive Network (CRAN) repository as follows:
install.packages("MatchThem")
The latest (though unstable) version of the package can be installed from GitHub as follows:
devtools::install_github(repo = "FarhadPishgar/MatchThem")
Suggested Workflow
Overview
Adopting algorithms to multiply impute the missing data, before the matching procedure, and the matching procedure itself may seem to be complicated tasks. This suggested workflow tries to simplify this process into a few steps:
- Imputing the Missing Data in the Dataset: Functions of the
mice
andAmelia
packages can be used to multiply impute the missing data in the dataset (theAmelia
package is designed to impute missing data in a single cross-sectional dataset or in a time-series dataset, although it may work with the latter, theMatchThem
package only supports the former data type). - Matching the Imputed Datasets: The
matchthem()
function from theMatchThem
package should be used to select matched observations from control and treatment groups of each imputed dataset (don't forget to check the extent of the balance in covariates after matching. You can use thecobalt
package for this purpose, which is now compatible with themimids
andwimids
objects, as well as, the tools provided in theMatchThem
package, itself). - Analyzing the Matched Datasets: The
with()
function from theMatchThem
package should be used to estimate treatment effect size by analyzing the matched datasets. - Pooling the Treatment Effect Size Estimates: The
pool()
function from theMatchThem
package should be used to pool the obtained treatment effect estimates from the previous step using Rubin’s rules.
Imputing the Missing Data in the Dataset
The mice
and Amelia
packages and their main functions provide the necessary tools for multiply imputing ignorable missing data in a dataset (several points should be considered before choosing the appropriate method for the imputation procedure, please see these packages reference manuals for details).
Matching the Imputed Datasets
The MatchThem
package and its main function, matchthem()
, provides the essential tools for selecting matched observations from control and treatment groups of imputed datasets. Currently, two matching approaches (match-then-pool and pool-then-match matching approaches) and two matching methods (nearest neighbor and exact matching methods) are available (within each of these approaches and methods, the MatchThem
package offers a variety of options).
The matchthem()
function requires the imputed datasets not to have any missing data (even in variables not included in the matching model).
The output of the matchthem()
function will be saved in an object of the mimids
class. The plot()
, print()
, and summary()
functions can be used to review detailed descriptions of these objects. Moreover, matchthem.data()
function can be used to extract the matched datasets in these objects.
Analyzing the Matched Datasets
The MatchThem
package and one of its functions, with()
, provides an easy way to analyze each matched imputed dataset.
The output of the with()
function will be saved in an object of the mira
class. The print()
and summary()
functions can be used to review detailed descriptions of these objects.
Pooling the Treatment Effect Size Estimates
The MatchThem
package and one of its functions, pool()
, provides the tools to easily pool the obtained treatment effect estimates from complete data analyses according to Rubin’s rules.
The output of the pool()
function will be saved in an object of the mipo
class. The print()
and summary()
functions can be used to review detailed descriptions of these objects.
Acknowledgments
The logo for this package, a trip to the Arctic, was designed and kindly provided by Max Josino (check his website and Dribble to see his beautiful works).
I would like to thank the CRAN team members for their comments and technical support. This package relies on the Amelia
, MatchIt
, mice
, and WeightIt
packages. Please cite their reference manuals and vignettes in your work besides citing reference manual and vignette of this package.
Authors
Farhad Pishgar
Clémence Leyrat