ps: Propensity score estimation

Description

ps calculates propensity scores and diagnoses them using a variety of methods, but centered on using boosted logistic regression as implemented in gbm

Usage

ps(formula = formula(data),
   data,
   sampw = rep(1, nrow(data)),
   title=NULL,
   stop.method = stop.methods[1:2],
   plots="all",
   pdf.plots=FALSE,
   n.trees = 10000,
   interaction.depth = 3,
   shrinkage = 0.01,
   perm.test.iters=0,
   print.level = 2,
   iterlim = 1000,
   verbose = TRUE)

Arguments

formula

a formula for the propensity score model with the treatment indicator on the left side of the formula and the potential confounding variables on the right side.

title

a short text title, it will be used in plots and saved files

data

the dataset, includes treatment assignment as well as covariates

sampw

optional sampling weights

stop.method

a stop.methods object, or a list of such objects, containing the metrics and rules for evaluating the quality of the propensity scores

plots

a character vector indicating which plots to create. The options are all (the default), optimize, ps boxplot, weight histogram, t pvalues, ks pvalues, es. Any other options (such as "none") will produce no plots. See the help for

pdf.plots

if TRUE then all plots are dumped to a pdf file with the name specified in title

n.trees

number of gbm iterations passed on to gbm

interaction.depth

interaction.depth passed on to gbm

shrinkage

shrinkage passed on to gbm

perm.test.iters

a non-negative integer giving the number of iterations of the permutation test for the KS statistic. If perm.test.iters=0 then the function returns an analytic approximation to the p-value. Setting perm.test.i

print.level

the amount of detail to print to the screen

iterlim

maximum number of iterations for the direct optimization

verbose

if TRUE, lots of information will be printed to monitor the the progress of the fitting

Value

Returns an object of class ps, a list containing
gbm.objThe returned gbm object
psa data frame containing the estimated propensity scores. Each column is associated with one of the methods selected in stop.methods
wa data frame containing the propensity score weights. Each column is associated with one of the methods selected in stop.methods. If sampling weights were given then these are incorporated into these weights
plot.infoa list containing the raw data used to generate the plots
desca list containing balance tables for each method selected in stop.methods. Includes a component for the unweighted analysis names unw. Each desc component includes a list with the following components [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
datestampRecords the date of the analysis
parametersSaves the ps call
alertsText containing any warnings accumulated during the estimation

Details

formula should be something like "treatment ~ X1 + X2 + X3". The treatment variable should be a 0/1 indicator. There is no need to specify interaction terms in the formula. interaction.depth controls the level of interactions to allow in the propensity score model. If pdf.plots=TRUE then ps causes plots to be saved as a single pdf file with the name "[title].pdf" in the working directory. See diag.plot for details of the plots.

References

Dan McCaffrey, G. Ridgeway, Andrew Morral (2004). Propensity Score Estimation with Boosted Regression for Evaluating Adolescent Substance Abuse Treatment, Psychological Methods 9(4):403-425.

Examples

Run this code

data(lalonde)
print(nrow(lalonde))

ps.lalonde <- ps(treat ~ age + educ + black + hispan + nodegree + 
                         married + re74 + re75, 
                 data = lalonde,
                 title="Lalonde example",
                 stop.method=stop.methods[c("ks.stat.mean","ks.stat.max")],
                 # generate plots?
                 plots="all",
                 pdf.plots=FALSE,
                 # gbm options
                 n.trees=2000,
                 interaction.depth=3,
                 shrinkage=0.005,
                 perm.test.iters=0,
                 verbose=TRUE)
                 
# get the balance tables
bal.table(ps.lalonde)

# diagnose the weights using a ps object 
a <- dx.wts(ps.lalonde,data=lalonde,treat.var="treat")
print(a)
bal.table(a)

# diagnose the weights as propensity score weights
#    will be the same as before, except for MC variation in the KS p-values
#    when perm.test.iters is greater than 0
w <- with(ps.lalonde, ps/(1-ps))
w[lalonde$treat==1,] <- 1
dx.wts(w,data=lalonde,treat.var="treat",
       perm.test.iters=0)

# diagnose the weights as propensity scores
p <- ps.lalonde$ps
dx.wts(p,data=lalonde,treat.var="treat",x.as.weights=FALSE)

# look at propensity scores
names(ps.lalonde$ps)
hist(ps.lalonde$ps$ks.stat.max)
boxplot(split(ps.lalonde$ps$ks.stat.max,ps.lalonde$treat),
        ylab="estimated propensity scores",
        names=c("control","treatment"))

# check out the balance
names(ps.lalonde$desc)
# unweighted
ps.lalonde$desc$unw
# optimized for ks.stat.max
ps.lalonde$desc$ks.stat.max

# check out the gbm object, indicates which variables are most influential in 
#    estimating the propensity score
summary(ps.lalonde$gbm.obj, n.trees=ps.lalonde$desc$ks.stat.max$n.trees)

# bal.stat() can use an arbitrary set of weights
bal.stat(data=lalonde,
         w.all=w[,1],
         vars=names(lalonde),
         treat.var="treat",
         get.means=TRUE,
         get.ks=TRUE,
         na.action="level")
         
# sensitivity analysis
sensitivity(ps.lalonde,lalonde,"re78")

Run the code above in your browser using DataLab