ps: Propensity score estimation

Description

ps calculates propensity scores, diagnostic plots and information for a dataframe which must include a treatment assignment variable, labeled y, having 0's for the comparison group cases, and 1's for the treatment group cases.

Usage

ps(formula = formula(data),
   data,
   sampw = rep(1, nrow(data)),
   title=NULL,
   stop.method = stop.methods[1:2],
   plots = TRUE,
   n.trees = 10000,
   interaction.depth = 3,
   shrinkage = 0.01,
   perm.test.iters=0,
   print.level = 2,
   iterlim = 1000,
   verbose = TRUE)

Arguments

formula

a formula for the propensity score model with the treatment indicator on the left side of the formula and the potential confounding variables on the right side.

title

a short text title, it will be used in plots and saved files

data

the dataset, includes treatment assignment as well as covariates

sampw

optional sampling weights

stop.method

a stop.methods object, or a list of such objects, containing the metrics and rules for evaluating the quality of the propensity scores

plots

a logical value for determining whether or not to plot balance diagnostics; default is to plot

n.trees

number of gbm iterations passed on to gbm

interaction.depth

interaction.depth passed on to gbm

shrinkage

shrinkage passed on to gbm

perm.test.iters

an non-negative integer giving the number of iterations of the permutation test for the KS statistic. If perm.test.iters=0 then the function returns an analytic approximation to the p-value. This argument is ignored

print.level

the amount of detail to print to the screen

iterlim

maximum number of iterations for the direct optimization

verbose

if TRUE, lots of information will be printed to monitor the the progress of the fitting

Value

Returns an object of class ps, a list containing
gbm.objThe returned gbm object
psa data frame containing the estimated propensity scores. Each column is associated with one of the methods selected in stop.methods
wa data frame containing the propensity score weights. Each column is associated with one of the methods selected in stop.methods. If sampling weights were given then these are incorporated into these weights
plot.infoa list containing the raw data used to generate the plots
desca list containing balance tables for each method selected in stop.methods. Includes a component for the unweighted analysis. See below for a list of the components of desc
datestampRecords the date of the analysis
parametersSaves the ps call
alertsText containing any warnings accumulated during the estimation
The desc component of the ps object contains detailed information on the model fit and diagnostics of the propensity score weights.
essThe effective sample size of the control group
n.treatThe number of subjects in the treatment group
n.ctrlThe number of subjects in the control group
max.esThe largest effect size across the covariates
mean.esThe mean absolute effect size
max.ksThe largest KS statistic across the covariates
mean.ksThe average KS statistic across the covariates
bal.taba (potentially large) table summarizing the quality of the weights for equalizing the distribution of features across the two groups. This table is best extracted using the bal.table method. See the help for that function for details on the table's contents
n.treesThe estimated optimal number of gbm iterations to optimize the loss function for the associated stop.methods

Details

formula should be something like "treatment ~ X1 + X2 + X3". The treatment variable should be a 0/1 indicator. There is no need to specify interaction terms in the formula. interaction.depth controls the level of interactions to allow in the propensity score model. The function ps causes plots to be saved as a single pdf file with the name "[title].pdf" in the working directory. The plots include

Boxplot of propensity scores
Histogram of comparison weights
P-value plots for unweighted and weighted T, KS, and Std effect size statistics
Change in effect size plot

References

Dan McCaffrey, G. Ridgeway, Andrew Morral (2004). "Propensity Score Estimation with Boosted Regression for Evaluating Adolescent Substance Abuse Treatment," Psychological Methods 9(4):403-425.

Examples

Run this code

data(lalonde)
print(nrow(lalonde))

ps.lalonde <- ps(treat ~ age + educ + black + hispan + nodegree + 
                         married + re74 + re75, 
                 data = lalonde,
                 title="Lalonde example",
                 stop.method=stop.methods$ks.stat.max,  
                 # generate plots?
                 plots=TRUE,
                 # gbm options
                 n.trees=2000,
                 interaction.depth=3,
                 shrinkage=0.005,
                 perm.test.iters=50,
                 verbose=TRUE)
                 
# get the balance tables
bal.table(ps.lalonde)

# diagnose the weights using a ps object 
a <- dx.wts(ps.lalonde,data=lalonde,treat.var="treat")
print(a)
bal.table(a)

# diagnose the weights as propensity score weights
# will be the same as before, except for MC variation in the KS p-values
w <- with(ps.lalonde, ps/(1-ps))
w[lalonde$treat==1,] <- 1
dx.wts(w,data=lalonde,treat.var="treat",
       perm.test.iters=100)

# diagnose the weights as propensity scores
p <- ps.lalonde$ps
p[lalonde$treat==1,] <- 1
dx.wts(p,data=lalonde,treat.var="treat",x.as.weights=FALSE)

# look at propensity scores
names(ps.lalonde$ps)
hist(ps.lalonde$ps$ks.stat.max)
boxplot(split(ps.lalonde$ps$ks.stat.max,ps.lalonde$treat),
        ylab="estimated propensity scores",
        names=c("control","treatment"))

# check out the balance
names(ps.lalonde$desc)
# unweighted
ps.lalonde$desc$unw
# optimized for ks.stat.max
ps.lalonde$desc$ks.stat.max

# check out the gbm object, indicates which variables are most influential in 
#    estimating the propensity score
summary(ps.lalonde$gbm.obj, n.trees=ps.lalonde$desc$ks.stat.max$n.trees)

# bal.stat() can use an arbitrary set of weights
bal.stat(data=lalonde,
         w.all=w[,1],
         vars=names(lalonde),
         treat.var="treat",
         get.means=TRUE,
         get.ks=TRUE,
         na.action="level")

Run the code above in your browser using DataLab