covariate_balance: Covariate balance in matched sample

Description

covariate_balance derives measures of covariate balance between treatment groups in matched samples. The function calculates normalized mean differences between all pairs of treatment conditions for each covariate.

Usage

covariate_balance(treatments, covariates, matching = NULL,
  target = NULL, normalize = TRUE, all_differences = FALSE)

Value

Returns the mean difference between treatment groups in the matched sample for each covariate.

When all_differences = TRUE, the function returns a matrix for each covariate with the mean difference for each possible pair of treatment conditions. Rows in the matrices indicate minuends in the differences and columns indicate subtrahends. For example, when differences are normalized, the matrix:

	a	b	c
a	0.0	0.3	0.5
b	-0.3	0.0	0.2
c	-0.5	-0.2	0.0

reports that the mean difference for the corresponding covariate between treatments "a" and "b" is 30% of a sample standard deviation of the covariate. The maximum difference (in absolute value) is also reported in a separate vector. For example, the maximum difference for the covariate in the example above is 0.5.

When all_differences = FALSE, only the maximum differences are reported.

Arguments

treatments: factor specifying the units' treatment assignments.
covariates: vector, matrix or data frame with covariates to derive balance for.
matching: qm_matching or scclust object with the matched groups. If NULL, balance is derived for the unmatched sample.
target: units to target the balance measures for. If NULL, the measures will be the raw average over all units in the sample (i.e., ATE). A non-null value specifies a subset of units to derive balance measures for (e.g., ATT or ATC). If target is a logical vector with the same length as the sample size, units indicated with TRUE will be targeted. If target is an integer vector, the units with indices in target are targeted. If target is a character vector, it should contain treatment labels, and the corresponding units (as given by treatments) will be targeted. If matching is NULL, target is ignored.
normalize: logical scalar indicating whether differences should be normalized by the sample standard deviation of the corresponding covariates.
all_differences: logical scalar indicating whether full matrices of differences should be reported. If FALSE, only the maximum difference for each covariate is returned.

Details

covariate_balance calculates covariate balance by first deriving the (normalized) mean difference between all treatment conditions for each covariate in each matched group. It then aggregates the differences by a weighted average, where the target parameter decides the weights. When the average treatment effect (ATE) is of interest (i.e., target == NULL), the matched groups will be weighted by their sizes. When target indicates that some subset of units is of interest, the number of such units in each matched group will decide its weight. For example, if we are interested in the average treatment effect of the treated (ATT), the weight of a group will be proportional to the number of treated units in that group. The reweighting of the groups captures that we are prepared to accept greater imbalances in groups with few units of interest.

By default, the differences are normalized by the sample standard deviation of the corresponding covariate (see the normalize parameter). In more detail, the sample variance of the covariate is derived separately for each treatment group. The square root of the mean of these variances is then used for the normalization. The matching is ignored when deriving the normalization factor so that balance can be compared across different matchings or with the unmatched sample.

covariate_balance focuses on mean differences, but higher moments and interactions can be investigated by adding corresponding columns to the covariate matrix (see examples below).

Examples

Run this code

# Construct example data
my_data <- data.frame(y = rnorm(100),
                      x1 = runif(100),
                      x2 = runif(100),
                      treatment = factor(sample(rep(c("T1", "T2", "C"), c(25, 25, 50)))))

# Make distances
my_distances <- distances(my_data, dist_variables = c("x1", "x2"))

# Balance in unmatched sample (maximum for each covariate)
covariate_balance(my_data$treatment, my_data[c("x1", "x2")])

# Make matching
my_matching <- quickmatch(my_distances, my_data$treatment)

# Balance in matched sample (maximum for each covariate)
covariate_balance(my_data$treatment, my_data[c("x1", "x2")], my_matching)

# Balance in matched sample for ATT
covariate_balance(my_data$treatment,
                  my_data[c("x1", "x2")],
                  my_matching,
                  target = c("T1", "T2"))

# Balance on second-order moments and interactions
balance_cov <- data.frame(x1 = my_data$x1,
                          x2 = my_data$x2,
                          x1sq = my_data$x1^2,
                          x2sq = my_data$x2^2,
                          x1x2 = my_data$x1 * my_data$x2)
covariate_balance(my_data$treatment, balance_cov, my_matching)

# Report all differences (not only maximum for each covariate)
covariate_balance(my_data$treatment,
                  my_data[c("x1", "x2")],
                  my_matching,
                  all_differences = TRUE)

Run the code above in your browser using DataLab