Learn R Programming

twang (version 1.0-3)

ps: Propensity score estimation

Description

ps calculates propensity scores and diagnoses them using a variety of methods, but centered on using boosted logistic regression as implemented in gbm

Usage

ps(formula = formula(data),
   data,
   sampw = rep(1, nrow(data)),
   title=NULL,
   stop.method = stop.methods[1:2],
   plots="all",
   pdf.plots=FALSE,
   n.trees = 10000,
   interaction.depth = 3,
   shrinkage = 0.01,
   perm.test.iters=0,
   print.level = 2,
   iterlim = 1000,
   verbose = TRUE)

Arguments

formula
a formula for the propensity score model with the treatment indicator on the left side of the formula and the potential confounding variables on the right side.
title
a short text title, it will be used in plots and saved files
data
the dataset, includes treatment assignment as well as covariates
sampw
optional sampling weights
stop.method
a stop.methods object, or a list of such objects, containing the metrics and rules for evaluating the quality of the propensity scores
plots
a character vector indicating which plots to create. The options are all (the default), optimize, ps boxplot, weight histogram, t pvalues, ks pvalues, es. Any other options (such as "none") will produce no plots. See the help for
pdf.plots
if TRUE then all plots are dumped to a pdf file with the name specified in title
n.trees
number of gbm iterations passed on to gbm
interaction.depth
interaction.depth passed on to gbm
shrinkage
shrinkage passed on to gbm
perm.test.iters
a non-negative integer giving the number of iterations of the permutation test for the KS statistic. If perm.test.iters=0 then the function returns an analytic approximation to the p-value. Setting perm.test.i
print.level
the amount of detail to print to the screen
iterlim
maximum number of iterations for the direct optimization
verbose
if TRUE, lots of information will be printed to monitor the the progress of the fitting

Value

  • Returns an object of class ps, a list containing
  • gbm.objThe returned gbm object
  • psa data frame containing the estimated propensity scores. Each column is associated with one of the methods selected in stop.methods
  • wa data frame containing the propensity score weights. Each column is associated with one of the methods selected in stop.methods. If sampling weights were given then these are incorporated into these weights
  • plot.infoa list containing the raw data used to generate the plots
  • desca list containing balance tables for each method selected in stop.methods. Includes a component for the unweighted analysis names unw. Each desc component includes a list with the following components [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
  • datestampRecords the date of the analysis
  • parametersSaves the ps call
  • alertsText containing any warnings accumulated during the estimation

Details

formula should be something like "treatment ~ X1 + X2 + X3". The treatment variable should be a 0/1 indicator. There is no need to specify interaction terms in the formula. interaction.depth controls the level of interactions to allow in the propensity score model. If pdf.plots=TRUE then ps causes plots to be saved as a single pdf file with the name "[title].pdf" in the working directory. See diag.plot for details of the plots.

References

Dan McCaffrey, G. Ridgeway, Andrew Morral (2004). Propensity Score Estimation with Boosted Regression for Evaluating Adolescent Substance Abuse Treatment, Psychological Methods 9(4):403-425.

See Also

gbm

Examples

Run this code
data(lalonde)
print(nrow(lalonde))

ps.lalonde <- ps(treat ~ age + educ + black + hispan + nodegree + 
                         married + re74 + re75, 
                 data = lalonde,
                 title="Lalonde example",
                 stop.method=stop.methods[c("ks.stat.mean","ks.stat.max")],
                 # generate plots?
                 plots="all",
                 pdf.plots=FALSE,
                 # gbm options
                 n.trees=2000,
                 interaction.depth=3,
                 shrinkage=0.005,
                 perm.test.iters=0,
                 verbose=TRUE)
                 
# get the balance tables
bal.table(ps.lalonde)

# diagnose the weights using a ps object 
a <- dx.wts(ps.lalonde,data=lalonde,treat.var="treat")
print(a)
bal.table(a)

# diagnose the weights as propensity score weights
#    will be the same as before, except for MC variation in the KS p-values
#    when perm.test.iters is greater than 0
w <- with(ps.lalonde, ps/(1-ps))
w[lalonde$treat==1,] <- 1
dx.wts(w,data=lalonde,treat.var="treat",
       perm.test.iters=0)

# diagnose the weights as propensity scores
p <- ps.lalonde$ps
dx.wts(p,data=lalonde,treat.var="treat",x.as.weights=FALSE)

# look at propensity scores
names(ps.lalonde$ps)
hist(ps.lalonde$ps$ks.stat.max)
boxplot(split(ps.lalonde$ps$ks.stat.max,ps.lalonde$treat),
        ylab="estimated propensity scores",
        names=c("control","treatment"))

# check out the balance
names(ps.lalonde$desc)
# unweighted
ps.lalonde$desc$unw
# optimized for ks.stat.max
ps.lalonde$desc$ks.stat.max

# check out the gbm object, indicates which variables are most influential in 
#    estimating the propensity score
summary(ps.lalonde$gbm.obj, n.trees=ps.lalonde$desc$ks.stat.max$n.trees)

# bal.stat() can use an arbitrary set of weights
bal.stat(data=lalonde,
         w.all=w[,1],
         vars=names(lalonde),
         treat.var="treat",
         get.means=TRUE,
         get.ks=TRUE,
         na.action="level")
         
# sensitivity analysis
sensitivity(ps.lalonde,lalonde,"re78")

Run the code above in your browser using DataLab