CMatch: Within and preferential-within cluster matching.

Description

This function implements multivariate and propensity score matching in clusters defined by the Group variable. It returns an object of class ''CMatch'' which can be be summarized and used as input of the CMatchBalance function to examine how much the procedure resulted in improved covariate balance.

Usage

CMatch(type, Y = NULL, Tr, X, Group = NULL, estimand = "ATT", M = 1, 
exact = NULL, caliper = 0.25, weights = NULL, replace = TRUE, ties = TRUE, ...)

Arguments

type

The type of matching desired. "within" for a pure within-cluster matching and "pwithin" for matching preferentially within. The preferential approach first searches for matchable units within the same cluster. If no match was found the algorithm searches in other clusters.

A vector containing the outcome of interest.

A vector indicating the treated and control units.

A matrix of covariates we wish to match on. This matrix should contain all confounders or the propensity score or a combination of both.

Group

A vector describing the clustering structure (typically the cluster ID). This can be any numeric vector of the same length of Tr and X containing integer numbers in ascending order otherwise an error message will be returned. Default is NULL, however if Group is missing, NULL or it contains only one value the output of the Match function is returned with a warning.

estimand

The causal estimand desired, one of "ATE", "ATT" and "ATC", which stand for Average Treatment Effect, Average Treatment effect on the Treated and on the Controls, respectively. Default is "ATT".

The number of matches which are sought for each unit. Default is 1 ("one-to-one matching").

exact

An indicator for whether exact matching on the variables contained in X is desired. Default is FALSE. This option has precedence over the caliper option.

caliper

A maximum allowed distance for matching units. Units for which no match was found within caliper distance are discarded. Default is 0.25. The caliper is interpreted in standard deviation units of the unclustered data for each variable. For example, if caliper=0.25 all matches at distance bigger than 0.25 times the standard deviation for any of the variables in X are discarded.

weights

A vector of specific observation weights.

replace

Matching can be with or without replacement depending on whether matches can be re-used or not. Default is TRUE.

ties

An indicator for dealing with multiple matches. If more than M matches are found for each unit the additional matches are a) wholly retained with equal weights if ties=TRUE; b) a random one is chosen if ties=FALSE. Default is TRUE.

…

Additional arguments to be passed to the Match function (not all of them can be used).

Value

index.control

The index of control observations in the matched dataset.

index.treated

The index of control observations in the matched dataset.

index.dropped

The index of dropped observations due to the exact or caliper option. Note that these observations are treated if estimand is "ATT", controls if "ATC".

est

The causal estimate. This is provided only if Y is not null. If estimand is "ATT" it is the (weighted) mean of Y in matched treated units minus the (weighted) mean of Y in matched controls. Equivalently, it is the weighted average of the within-cluster ATTs, with weights given by cluster sizes in the matched dataset.

A model-based standard error for the causal estimand. This is a cluster robust estimator of the standard error for the linear model: Y ~ constant+Tr, run on the matched dataset (see cluster.vcov for details on how this estimator is obtained). Note that these standard errors differ from a weighted average of cluster specific standard errors provided by the Matchby function, which are generally larger. Estimating standard errors for causal parameters with clustered data is an active field of research and there is no perfect solution to date.

mdata

A list containing the matched datasets produced by CMatch. Three datasets are included in this list: Y, Tr and X. The matched dataset for Group can be recovered by rbind(Group[index.treated],Group[index.control]).

orig.treated.nobs.by.group

The original number of treated observations by group in the dataset.

orig.control.nobs.by.group

The original number of control observations by group in the dataset.

orig.dropped.nobs.by.group

The number of dropped observations by group after within cluster matching.

orig.nobs

The original number of observations in the dataset.

orig.wnobs

The original number of weighted observations in the dataset.

orig.treated.nobs

The original number of treated observations in the dataset.

orig.control.nobs

The original number of control observations in the dataset.

wnobs

the number of weighted observations in the matched dataset.

caliper

The caliper used.

intcaliper

The internal caliper used.

exact

The value of the exact argument.

ndrops.matches

The number of matches dropped either because of the caliper or exact option (or because of forcing the match within-clusters).

estimand

The estimand required.

Details

This function is meant to be a natural extension of the Match function to clustered data. It retains the main arguments of Match but it has additional output showing matching results cluster by cluster. It differs from wrapper Matchby in package Matching in the way standard errors are calculated and because the caliper is in standard deviation units of the covariates on the overall dataset (so the caliper is the same for all clusters). Moreover, observation weights are available.

References

Sekhon, Jasjeet S. 2011. Multivariate and Propensity Score Matching Software with Automated Balance Optimization. Journal of Statistical Software 42(7): 1-52. http://www.jstatsoft.org/v42/i07/

Arpino, B., and Cannas, M. (2016) Propensity score matching with clustered data. An application to the estimation of the impact of caesarean section on the Apgar score. Statistics in Medicine, 35: 2074<U+2013>2091. doi: 10.1002/sim.6880.

Examples

Run this code

# NOT RUN {
data(schools)
	
# Kreft and De Leeuw, Introducing Multilevel Modeling, Sage (1988).   
# The data set is the subsample of NELS-88 data consisting of 10 handpicked schools
# from the 1003 schools in the full data set.
 
# Suppose that the effect of homeworks on math score is unconfounded conditional on X
# and unobserved school features (we assume this only for illustrative purpouse).

# Let us consider the following variables:

X<-schools$ses # or X<-as.matrix(schools[,c("ses","white","public")])
Y<-schools$math
Tr<-ifelse(schools$homework>1,1,0)
Group<-schools$schid
# When Group is missing or there is only one Group CMatch returns 
# the output of the Match function with a warning.

# Let us assume that the effect of homeworks (Tr) on math score (Y)
# is unconfounded conditional on X and other unobserved school features.
# Several strategies to handle unobserved group characteristics
# are described in Arpino & Cannas, 2016 (see References). 


# Multivariate Matching on covariates in X 
# default parameters: one-to-one matching on X with replacement with a caliper of 0.25

### Matching within schools
 mw<-CMatch(type="within",Y=Y, Tr=Tr, X=X, Group=Group, caliper=0.1)
 
 # compare balance before and after matching
 bmw  <- CMatchBalance(Tr~X,data=schools,match.out=mw)
 
 # calculate proportion of matched observations
  (mw$orig.treated.nobs-mw$ndrops)/mw$orig.treated.nobs 
  
 # check number of drops by school
 mw$orig.dropped.nobs.by.group
 
 # examine output
 mw                   # complete list of results                 
 summary(mw)  # basic statistics
 
### Match preferentially within school 
# i.e. first match within schools
# then (try to) match remaining units between schools
 mpw <- CMatch(type="pwithin",Y=schools$math, Tr=Tr, X=schools$ses, 
 Group=schools$schid, caliper=0.1)

# examine covariate balance
  bmpw<- CMatchBalance(Tr~ses,data=schools,match.out=mpw)
  # equivalent to MatchBalance(...) with mpw coerced to class "Match"

# proportion of matched observations
  (mpw$orig.treated.nobs-mpw$ndrops) / mpw$orig.treated.nobs 
# check drops by school
  mpw$orig.dropped.nobs.by.group.after.pref.within
# proportion of matched observations after match-within only
 (mpw$orig.treated.nobs-sum(mpw$orig.dropped.nobs.by.group.after.within)) / mpw$orig.treated.nobs

# see complete output
   mpw
# or use summary method for main results
   summary(mpw) 

#### Propensity score matching

# estimate the ps model

mod <- glm(Tr~ses+parented+public+sex+race+urban,
family=binomial(link="logit"),data=schools)
eps <- fitted(mod)

# eg 1: within school propensity score matching
psmw <- CMatch(type="within",Y=schools$math, Tr=Tr, X=eps, 
Group=schools$schid, caliper=0.1)
# equivalent to direct call at MatchW(Y=schools$math, Tr=Tr, X=eps,
# Group=schools$schid, caliper=0.1)

# eg 2: preferential within school propensity score matching
psmw <- CMatch(type="pwithin",Y=schools$math, Tr=Tr, X=eps, Group=schools$schid, caliper=0.1)

# Other strategies for controlling unobserved cluster covariates 
# via different specifications of propensity score (see Arpino and Mealli):

# eg 3: propensity score matching using ps estimated from a logit model with dummies for hospitals

mod <- glm(Tr ~ ses + parented + public + sex + race + urban 
+schid - 1,family=binomial(link="logit"),data=schools)
eps <- fitted(mod)

dpsm <- CMatch(type="within",Y=schools$math, Tr=Tr, X=eps, Group=NULL, caliper=0.1)
# this is equivalent to run Match with X=eps

# eg4: propensity score matching using ps estimated from multilevel logit model 
# (random intercept at the hospital level)

require(lme4)
mod<-glmer(Tr ~ ses + parented + public + sex + race + urban + (1 | schid),
family=binomial(link="logit"), data=schools)
eps <- fitted(mod)

mpsm<-CMatch(type="within",Y=schools$math, Tr=Tr, X=eps, Group=NULL, caliper=0.1)
# this is equivalent to run Match with X=eps

 

# }

Run the code above in your browser using DataLab