att: Example of ATT estimation from CEM output

Description

An example of ATT estimation from CEM output

Usage

att(obj, formula, data, model="linear", extrapolate=FALSE, ntree=2000)
# S3 method for cem.att
plot(x, obj, data, vars=NULL, plot=TRUE, ecolors, ...)
# S3 method for cem.att
summary(object, ...)

Arguments

obj

a cem.match or cem.match.list object

formula

a model formula. See Details.

data

a single data.frame or a list of data.frame's in case of cem.match.list

model

one model. See Details.

extrapolate

extrapolate the CEM restriced estimate to the whole data. Default = FALSE.

ntree

number of trees to generate in random forest model. Default = 2000.

the output from the att function

vars

a vector of variable names to be used in the parallel plots. By default all variables involved in data matching are used.

object

an object of class cem.att function

plot

if TRUE the plot is produced, otherwise only calculations are made.

ecolors

a vector of three colors respectively for positive, zero and negative treatment effect. Default c("blue","black","red").

...

passed to the plot function or to printCoefmat for the method summary

Value

A matrix of estimates with their standard error, or a list in the case of cem.match.list. For plot.att a list of strata estimated treatment effect and group ("positive", "negative", "zero") and individual treatment and effect and group. The individual treatment effect and group is given by the treatment effect of the strata. Similarly for the group ("positive", "negative", "zero"). Also, colors associated to estimated treatment effects are returned for easy subsequent plotting.

Details

Argument model can be lm, linear for linear regression model; logit for the the logistic model; lme, linear-RE for the linear model with random effects. Also rf, forest for the randomforest algorithm.

If the outcome is y and the treatment variable is T, then a formula like y ~ T will produce the simplest estimate the ATT: with lm, it is just the coefficient on T, which is the same as the difference in means, weighted by CEM stratum size. Users can add covariates to span any remaining imbalance after the match, such as y ~ T + age + sex, to adjust for variables age and sex.

In the case of multiply imputed datasets, the model is applied to each single matched data and the ATT and is the standard error estimated using the standard formulas for combining results of multiply imputed data.

When extrapolate = TRUE, the estimate model is extrapolated to the whole set of data.

There is a print method for the output of att. Specifying the option TRUE in a print command gives complete output from the estimated model when availalble.

References

Stefano Iacus, Gary King, Giuseppe Porro, ``Matching for Casual Inference Without Balance Checking: Coarsened Exact Matching,'' http://gking.harvard.edu/files/abs/cem-abs.shtml

Examples

Run this code

# NOT RUN {
data(LL)

# cem match: automatic bin choice
mat <- cem(treatment="treated",data=LL, drop="re78", keep.all=TRUE)
mat
mat$k2k

# ATT estimate
homo1 <- att(mat, re78~treated,  data=LL)
rand1 <- att(mat, re78~treated,  data=LL, model="linear-RE")
rf1 <- att(mat, re78~treated,  data=LL, model="rf")

homo2 <- att(mat, re78~treated,  data=LL, extra=TRUE)
rand2 <- att(mat, re78~treated,  data=LL, model="linear-RE", extra=TRUE)
rf2 <- att(mat, re78~treated,  data=LL, model="rf", extra=TRUE)

homo1
summary(homo1)

rand1
rf1

homo2
rand2
rf2

plot( homo1, mat, LL, vars=c("age","education","re74","re75"))
plot( rand1, mat, LL, vars=c("age","education","re74","re75"))
plot( rf1, mat, LL, vars=c("age","education","re74","re75"))

plot( homo2, mat, LL, vars=c("age","education","re74","re75"))
plot( rand2, mat, LL, vars=c("age","education","re74","re75"))
plot( rf2, mat, LL, vars=c("age","education","re74","re75"))


# reduce the match into k2k using euclidean distance within cem strata
mat2 <- k2k(mat, LL, "euclidean", 1)
mat2
mat2$k2k

# ATT estimate after k2k
att(mat2, re78~treated, data=LL)

# example with missing data
# using multiply imputated data
# we use Amelia for multiple imputation
# }
# NOT RUN {
 if(require(Amelia)){
  data(LL)
  n <- dim(LL)[1]
  k <- dim(LL)[2]

# we generate missing values in 30<!-- % of the rows of LL data -->
# randomly in one colum per row
  LL1 <- LL
  idx <- sample(1:n, .3*n)
  invisible(sapply(idx, function(x) LL1[x,sample(2:k,1)] <<- NA))


  imputed <- amelia(LL1)
  imputed <- imputed$imputations[1:5]

  mat <- cem("treated", datalist=imputed, data=LL1, drop="re78")

  print(mat)
  
  att(mat, re78 ~ treated, data=imputed)
 }
# }

Run the code above in your browser using DataLab