artlessV2: Artless Automatic Matching, Version 2

Description

Implements a simple version of multivariate matching using a propensity score, near-exact matching, near-fine balance, and robust Mahalanobis distance matching. You specify the variables, and the program does everything else. Should you be artful, not artless? See the notes.

Usage

artlessV2(dat, z, x = NULL, pr = NULL, xm = NULL, near = NULL,
    fine = NULL, ncontrols = 1, rnd = 2, solver = "rlemon")

Value

match: A dataframe containing the matched data set. match contains the rows of dat in a different order. match adds two columns to dat, called mset and matched, which identify matched pairs or matched sets. Specifically, matched is TRUE if a row is in the matched sample and is FALSE otherwise. Rows of dat that are in the same matched set have the same value of mset. The rows of match are sorted by mset with the treated individual before the matched controls. The unmatched controls with matched=FALSE appear as the last rows of match. When you analyze the matched data to estimate treatment effects, please be careful to remove rows of match with matched==FALSE. The rows with matched==FALSE are only useful in understanding what matching has accomplished (or failed to accomplish); see, for instance, Figure 4.2 in Rosenbaum (2025).
balance: A matrix called the balance table. The matrix has one row for each covariate in x. It also has a first row for the propensity score. There are five columns. Column 1 is the mean of the covariate in the treated group. Column 2 is the mean of the covariate in the matched control group. Column 3 is the mean of the covariate among all controls prior to matching. Column 4 is the difference between columns 1 and 2 divided by a pooled estimate of the standard deviation of the covariate before matching. Column 5 is the difference between columns 1 and 3 divided by a pooled estimate of the standard deviation of the covariate before matching. Notice that columns 4 and 5 have the same denominator, but different numerators. Tom Love (2002) suggests a graphical display of this information.

Arguments

dat: A dataframe containing the data set that will be matched. Let N be the number of rows of dat.
z: A binary vector with N coordinates where z[i]=1 if the ith row of dat describes a treated individual and z[i]=0 if the ith row of dat describes a control.
x: x is a numeric matrix with N rows. If pr is NULL, then the covariates in x are used to estimate a propensity score using a linear logit model that predicts z from x. An error will stop the program if pr and x are both NULL. If neither pr nor x is NULL, then a harmless warning message will remind you that your propensity score, pr, was used in matching and x was not used to estimate the propensity score. The balance table describes the covariates in x; so, those covariates should be continuous variables or binary variables that can be described by a mean or a proportion, not nominal categories.
pr: A vector with N coordinates containing an estimated propensity or similar quantity. If pr is NULL, then the program estimates the propensity score; see the discussion of x above. An error will stop the program if both pr and x are NULL.
xm: xm is a numeric matrix with N rows. The covariates in xm are used to define a robust Mahalanobis distance between treated and control individuals. Use of a matrix xm is optional.
near: A numeric vector of length N or a numeric matrix with N rows. Each column of near should represent levels of a nominal covariate with two or a few levels. The variables in near are used in near-exact matching. Use of a matrix near is optional.
fine: A numeric vector of length N or a numeric matrix with N rows. Each column of fine should represent levels of a nominal covariate with two or a few levels. The variables in fine are used in near-fine balancing. Use of a matrix fine is optional.
ncontrols: A positive integer. ncontrols is the number of controls to be matched to each treated individual. The default is matched pairs, i.e., one control.
rnd: A nonnegative integer. The balance table is rounded for display to rnd digits.
solver: Either "rlemon" or "rrelaxiv". The rlemon solver is automatically available without special installation. The rrelaxiv requires a special installation. See the note.

Author

Paul R. Rosenbaum

Details

This function builds a matched treated-control sample from an unmatched data set. It asks you to designate roles for specific covariates, and it does the rest. It is described as ``artless automatic matching'' because it makes decisions by default. Perhaps you could make better decisions; if so, perhaps try alittleArt() in this package, which give you much more control over decisions. For even more control over matching decisions, try the iTOS package. artlessV2() will often create a reasonable matched sample with little effort; however, it also could be used as a first step in learning the art of constructing a matched sample. Wittgenstein spoke of a the ``ladder you throw away after you have climbed it,'' and artlessV2() can also serve that function.

References

Bertsekas, D. P., Tseng, P. (1988) <doi:10.1007/BF02288322> The Relax codes for linear minimum cost network flow problems. Annals of Operations Research, 13, 125-190.

Bertsekas, D. P. (1990) <doi:10.1287/inte.20.4.133> The auction algorithm for assignment and other network flow problems: A tutorial. Interfaces, 20(4), 133-149.

Bertsekas, D. P., Tseng, P. (1994) <http://web.mit.edu/dimitrib/www/Bertsekas_Tseng_RELAX4_!994.pdf> RELAX-IV: A Faster Version of the RELAX Code for Solving Minimum Cost Flow Problems.

Greifer, N. and Stuart, E.A., (2021). <doi:10.1093/epirev/mxab003> Matching methods for confounder adjustment: an addition to the epidemiologist’s toolbox. Epidemiologic Reviews, 43(1), pp.118-129.

Hansen, B. B. and Klopfer, S. O. (2006) <doi:10.1198/106186006X137047> "Optimal full matching and related designs via network flows". Journal of computational and Graphical Statistics, 15(3), 609-627. ('optmatch' package)

Hansen, B. B. (2007) <https://www.r-project.org/conferences/useR-2007/program/presentations/hansen.pdf> Flexible, optimal matching for observational studies. R News, 7, 18-24. ('optmatch' package)

Love, Thomas E. (2002) Displaying covariate balance after adjustment for selection bias. Joint Statistical Meetings. Vol. 11. https://chrp.org/love/JSM_Aug11_TLove.pdf

Niknam, B.A. and Zubizarreta, J.R. (2022). <10.1001/jama.2021.20555> Using cardinality matching to design balanced and representative samples for observational studies. JAMA, 327(2), pp.173-174.

Pimentel, S. D., Yoon, F., & Keele, L. (2015) <doi:10.1002/sim.6593> Variable‐ratio matching with fine balance in a study of the Peer Health Exchange. Statistics in Medicine, 34(30), 4070-4082.

Pimentel, S. D., Kelz, R. R., Silber, J. H. and Rosenbaum, P. R. (2015) <doi:10.1080/01621459.2014.997879> Large, sparse optimal matching with refined covariate balance in an observational study of the health outcomes produced by new surgeons. Journal of the American Statistical Association, 110, 515-527.

Rosenbaum, P. R. and Rubin, D. B. (1985) <doi:10.1080/00031305.1985.10479383> Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician, 39, 33-38.

Rosenbaum, P. R. (1989) <doi:10.1080/01621459.1989.10478868> Optimal matching for observational studies. Journal of the American Statistical Association, 84(408), 1024-1032.

Rosenbaum, P. R., Ross, R. N. and Silber, J. H. (2007) <doi:10.1198/016214506000001059> Minimum distance matched sampling with fine balance in an observational study of treatment for ovarian cancer. Journal of the American Statistical Association, 102, 75-83.

Rosenbaum, P. R. (2020a) <doi:10.1007/978-3-030-46405-9> Design of Observational Studies (2nd Edition). New York: Springer.

Rosenbaum, P. R. (2020b). <doi:10.1146/annurev-statistics-031219-041058> Modern algorithms for matching in observational studies. Annual Review of Statistics and Its Application, 7(1), 143-176.

Rosenbaum, P. R. and Zubizarreta, J. R. (2023). <doi:10.1201/9781003102670> Optimization Techniques in Multivariate Matching. Handbook of Matching and Weighting Adjustments for Causal Inference, pp.63-86. Boca Raton: FL: Chapman and Hall/CRC Press.

Rosenbaum, P. R. (2025) <doi:10.1007/978-3-031-90494-3> Introduction to the Theory of Observational Studies. New York: Springer.

Rubin, D. B. (1980) <doi:10.2307/2529981> Bias reduction using Mahalanobis-metric matching. Biometrics, 36, 293-298.

Stuart, E.A., (2010). <doi:10.1214/09-STS313> Matching methods for causal inference: A review and a look forward. Statistical Science, 25(1), 1-21.

Yang, D., Small, D. S., Silber, J. H. and Rosenbaum, P. R. (2012) <doi:10.1111/j.1541-0420.2011.01691.x> Optimal matching with minimal deviation from fine balance in a study of obesity and surgical outcomes. Biometrics, 68, 628-636.

Yu, Ruoqi, and P. R. Rosenbaum. <doi:10.1111/biom.13098> Directional penalties for optimal matching in observational studies. Biometrics 75, no. 4 (2019): 1380-1390.

Yu, R., Silber, J. H., & Rosenbaum, P. R. (2020) <doi:10.1214/19-STS699> Matching methods for observational studies derived from large administrative databases. Statistical Science, 35(3), 338-355.

Yu, R. (2021) <doi:10.1111/biom.13374> Evaluating and improving a matched comparison of antidepressants and bone density. Biometrics, 77(4), 1276-1288.

Yu R. & Rosenbaum, P. R. (2022) <doi:10.1080/10618600.2022.2058001> Graded matching for large observational studies. Journal of Computational and Graphical Statistics, 31(4):1406-1415.

Yu, R. (2023) <doi:10.1111/biom.13771> How well can fine balance work for covariate balancing? Biometrics. 79(3), 2346-2356.

Zhang, B., D. S. Small, K. B. Lasater, M. McHugh, J. H. Silber, and P. R. Rosenbaum (2023) <doi:10.1080/01621459.2021.1981337> Matching one sample according to two criteria in observational studies. Journal of the American Statistical Association, 118, 1140-1151.

Zubizarreta, J.R., 2012. <doi:10.1080/01621459.2012.703874>Using mixed integer programming for matching in an observational study of kidney failure after surgery. Journal of the American Statistical Association, 107(500), pp.1360-1371.

Zubizarreta, J. R., Reinke, C. E., Kelz, R. R., Silber, J. H. and Rosenbaum, P. R. (2011) <doi:10.1198/tas.2011.11072> Matching for several sparse nominal variables in a case control study of readmission following surgery. The American Statistician, 65(4), 229-238.

Zubizarreta, J.R., Stuart, E.A., Small, D.S. and Rosenbaum, P.R. eds. (2023). <doi:10.1201/9781003102670> Handbook of Matching and Weighting Adjustments for Causal Inference. Boca Raton: FL: Chapman and Hall/CRC Press.

Examples

Run this code

# \donttest{
# The example below uses the binge data from the iTOS package.
# See the documentation for binge in the iTOS package for more information.
#
library(iTOS)
data(binge)
b2<-binge[binge$AlcGroup!="P",] # Match binge drinkers to nondrinkers
z<-1*(b2$AlcGroup=="B") # Treatment/control indicator
b2<-cbind(b2,z)
rm(z)
rownames(b2)<-b2$SEQN
attach(b2)
#
agec<-as.integer(ageC)
#
# x contains the variables in the propensity score
#
x<-data.frame(age,female,education,bmi,vigor,smokenow,smokeQuit,bpRX)
#
#  Create nominal covariates to include in near or fine
#
smoke<-1*(smokenow==1)
dontSmoke<-1*(smokenow==3)
age50<-1*(age>=50)
bmi30<-1*(bmi>=30)
ed2<-1*(education<=2)
smoke<-1*(smokenow==1)
#
#  near contains covariates to be matched as exactly as possible
#
near<-cbind(female,dontSmoke)
#
# xm contains covariates in the robust Mahalanobis distance
# Includes some continuous covariates.
#
xm<-cbind(age,bmi,vigor,smokenow,education)
#
# fine contains covariate that will be balanced, but not matched
#
fine<-cbind(ageC,ed2,smoke,dontSmoke)
rm(agec,bmi30,smoke,ed2,age50)
detach(b2)

mc<-artlessV2(b2,b2$z,x,xm=xm,near=near,fine=fine,ncontrols=3)
#
#  Here are the first two 1-to-3 matched sets.
#
mc$match[1:8,]
#
#  You can check that every matched set is exactly matched for
#  female and nonsmoking.  This is from near-exact matching.
#  In some other data set, the number of mismatches might be
#  minimized, not driven to zero.
#
#  The balance table shows that large imbalances in covariates
#  existed before matching, but are much smaller after matching.
#  Look, for example, at the propensity score, female, and
#  the several versions of the smoking variable.
#
mc$balance
m<-mc$match
m<-m[m$matched,] # Remove the unmatched controls
table(m$z) # 3 to 1 matching
boxplot(m$age~m$z)
# }

Run the code above in your browser using DataLab