xBalance: STANDARDIZED DIFFERENCES FOR STRATIFIED COMPARISONS

Description

Given covariates, a treatment variable, and a stratifying factor, calculates standardized differences (biases) along each covariate, with and without the stratification. Also, tests for conditional independence of the treatment variable and the covariates within strata.

Usage

xBalance(fmla, strata=NULL, data, report=c("std.diffs","z.scores"),
         stratum.weights=harmonic, na.rm=FALSE,
         covariate.scaling=NULL,
         normalize.weights=TRUE)

Arguments

fmla

A formula containing an indicator of treatment assignment on the left hand side and covariates at right.

strata

NULL, a factor with length equal to the number of rows in data, or a data frame of such factors.

data

A data frame in which fmla to be evaluated.

report

Character vector listing measures to report for each stratification; a subset of

c("adj.means","adj.mean.diffs",
      "chisquare.test","std.diffs","z.scores","p.values")

. P-values reported are 2-sided for the null-hypotheis of no effect.

na.rm

Whether to remove rows with NAs on any variables mentioned on the RHS of fmla (i.e. listwise deletion). Defaults to FALSE, wherein rows aren't deleted but for each variable with NAs a missing-data indica

stratum.weights

Weights to be applied when aggregating across strata specified by strata, defaulting to weights proportional to the harmonic mean of treatment and control group sizes within strata. This can be either a function used to calcu

covariate.scaling

scale factor to apply to covariates in calculating std.diffs. If NULL, xBalance pools standard deviations of each variable in the treatment and control group (defining these groups according to wheth

normalize.weights

If TRUE, then stratum weights are normalized so as to sum to 1. Defaults to TRUE.

Value

A data frame with as many rows as there were covariates and levels of covariates in fmla and columns including some or all of XX.difference, XX.z, XX.difference, XX.z, XX.p, XX.Tx.eq.0, XX.Tx.eq.1, where XX ranges over the stratifying variables given in strata. If chisquare.test is in report, then the data frame also has attributes XX.chisquare and XX.df. Its class is c(xbal, data.frame). There are plot and print methods for class newbal; the print method is demonstrated in the examples.

Details

In the unstratified case, the standardized difference of covariate means is the mean in the treatment group minus the mean in the control group, divided by the sd in the same variable estimated by pooling treatment and control group sds on the same variable. In the stratified case, the denominator of the standardized difference remains the same but the numerator is a weighted average of within-stratum differences in means on the covariate. By default, each stratum is weighted in proportion to the harmonic mean $1/[(1/a + 1/b)/2]=2*a*b/(a+b)$ of the number of treated units (a) and control units (b) in the stratum; this weighting is optimal under certain modeling assumptions (discussed in Kalton 1968, Hansen and Bowers 2008). This weighting can be modified using the stratum.weights argument; see below.

When the treatment variable, the variable specified by the left-hand side of fmla, is not binary, xBalance calculates the covariates' regressions on the treatment variable, in the stratified case pooling these regressions across strata using weights that default to the stratum-wise sum of squared deviations of the treatment variable from its stratum mean. (Applied to binary treatment variables, this recipe gives the same result as the one given above.) In the numerator of the standardized difference, we get a ``pooled sd'' from separating units into two groups, one in which the treatment variable is 0 or less and another in which it is positive. If report includes "adj.means", covariate means for the former of these groups are reported, along with the sums of these means and the covariates' regressions on either the treatment variable, in the unstratified (``pre'') case, or the treatment variable and the strata, in the stratified (``post'') case.

stratum.weights can be either a function or a numeric vector of weights. If it is a numeric vector, it should be nonnegative and it should have stratum names as its names. (I.e., its names should be equal to the levels of the factor specified by strata.) If it is a function, it should accept one argument, a data frame containing the variables in data and additionally Tx.grp and stratum.code, and return a vector of nonnegative weights with stratum codes as names; for an example, do getFromNamespace("harmonic", "RItools").

If covariate.scaling is not NULL, no scaling is applied. This behavior is likely to change in future versions. (If you want no scaling, set covariate.scaling=1, as this is likely to retain this meaning in the future.)

References

Hansen, B.B. and Bowers, J. (2008), ``Covariate Balance in Simple, Stratified and Clustered Comparative Studies,'' Statistical Science 23.

Kalton, G. (1968), ``Standardization: A technique to control for extraneous variables,'' Applied Statistics 17, 118--136.

Examples

Run this code

data(nuclearplants)

xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n, data=nuclearplants)

xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n, data=nuclearplants,
         report=c("adj.means","adj.mean.diffs",'std.diffs', 'z.scores', 'chisquare.test'))

xBalance(pr~.-cost-pt, strata=factor(nuclearplants$pt), data=nuclearplants, 
         report=c("adj.means","adj.mean.diffs",'std.diffs', 'z.scores', 'chisquare.test'))

xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
         strata=data.frame(unstrat=factor(character(32)),
           pt=factor(nuclearplants$pt)),
         data=nuclearplants,
         report=c("adj.means","adj.mean.diffs",'std.diffs', 'z.scores', 'chisquare.test'))

xBalance(pr~ date + t1 + t2 + cap + ne + ct + bw + cum.n,
         strata=list(unstrat=NULL, pt=~pt),
         data=nuclearplants,
         report=c("adj.means","adj.mean.diffs",'std.diffs', 'z.scores', 'chisquare.test'))

Run the code above in your browser using DataLab