love.plot: Generate Balance Plots for Publication

Description

Generates a "Love" plot graphically displaying covariate balance before and after adjusting.

Usage

love.plot(b, stat = c("mean.diffs", "variance.ratios"),  threshold = NULL, abs = FALSE, var.order = NULL,  no.missing = TRUE, var.names = NULL, drop.distance = TRUE, cluster.fun = c("mean",  "median", "max", "range"), ...)

Arguments

a bal.tab object; the output of a call to bal.tab(). m.threshold, v.threshold, and r.threshold can be used in bal.tab() instead of love.plot()'s threshold parameter.

stat

character; which statistic should be reported if treatment is binary. The options are "mean.diffs" for mean differences (standardized or not according the options selected in bal.tab object) and "variance.ratios" for variance ratios. "mean.diffs" is default. Abbreviations allowed.

threshold

numeric; an optional value to be used as a threshold marker in the plot. Overrides the threshold set in the bal.tab object.

abs

logical; whether to present the statistic in absolute value or not if stat = "mean.diffs" or the treatment variable is continuous. Defaults to TRUE when balance is plotted across clusters.

var.order

character; how to order the variables in the plot. If NULL, they will be displayed in alphabetical order. If "adjusted", they will be ordered by the balance statistic of the adjusted sample. if "unadjusted", they will be ordered by the balance statistic of the unadjusted sample. "unadjusted" looks the nicest, but NULL should be used if comparing variables across data sets to maintain variable order.

no.missing

logical; whether to drop rows for variables for which the statistic has a value of NA, for example, variance ratios for binary variables. If FALSE, there will be rows for these variables but no points representing their value, and a warning message from ggplot2 will appear.

var.names

an optional object providing alternate names for the variables in the plot, which will otherwise be the variable names as they are stored. This may be useful when variables have ugly names. If var.order is NULL, the variables will be placed in alphabetical order of the new variable names. See Details on how to specify var.names.

drop.distance

logical; whether to ignore the distance measure (if there is one) in plotting. Because balance on the covariates is primary goal of conditioning, including balance on the distance measure can be misleading; therefore, the default is TRUE, so that the distance measure is not displayed.

cluster.fun

if balance is to be displayed across clusters rather than within a single cluster, which summarizing function (mean, median, max, or range) of the balance statistics sould be used. If "range" is entered, love.plot() will display a line from the min to the max with a point at the mean for each covariate; it can only be used if quick = FALSE in the bal.tab() call. Abbreviations allowed; "mean" is default.

...

further arguments passed to or from other methods. They are ignored in this function.

Value

A "ggplot" object, returned invisbly.

Details

love.plot() uses ggplot from the ggplot2 package, and (invisibly) returns a "ggplot" object. This means that users can edit aspects of the plot using ggplot2 syntax.

The default in love.plot() is to present variables as they are named in the output of the call to bal.tab(), so it is important to know this output before specifying alternate variable names when using var.names, as the displayed variable names may differ from those in the original data. Note that if drop.distance = TRUE, which the default, the distance measure, if any, will not count as a variable below; otherwise, it will count.

There are several ways to specify alternate names for presentation in the displayed plot using the var.names argument. You can use a vector of alternate names the same length as the variable list output from bal.tab(), and love.plot() will use these names instead. To leave a variable name as is, enter "" or NA in the position of that variable. Another way is to specify a list of old and new variable names, pairing the old name with the new name. You can do this in three ways: 1) use a vector of new variable names, with the names of the values the old variable names; 2) use a data frame with exactly one column containing the new variable names and the row names containing the old variable names; or 3) use a data frame with two columns, the first containing the old variable names and the second containing the new variable names. This third method is the safest because the coersion rules in R are least likely to affect the input. If a variable in the output from bal.tab() is not provided in the list of old variable names, love.plot() will use the original old variable name.

There are two ways to use love.plot() with clusters, and in both, the cluster argument must be specified in the call to bal.tab(). First, one can display a plot for balance in a single cluster; to do this, the call to bal.tab() must have which.cluster specified, and the argument therein must refer to a single cluster either by name or index. Second, one can display a plot summarizing balance across clusters; to do this, which.cluster in bal.tab() should be empty, NULL, or NA, and an argument should be given to cluster.fun in love.plot() referring to whether the mean, median, or maximum ("max") balance statistic or range ("range") of balance statistics for each covariate across clusters should be presented in the plot. In order to use "range", quick in bal.tab() must be set to FALSE, because setting it to TRUE suppresses calculation of non-displayed values, and the minimum statistic across clusters, required for displaying the range, is not normally displayed.

Examples

Run this code

library(MatchIt); data("lalonde", package = "cobalt")

## Nearest Neighbor matching
m.out1 <- matchit(treat ~ age + educ + black + hispan + 
                  married + nodegree + re74 + re75, 
                  data = lalonde)

love.plot(bal.tab(m.out1), stat = "mean.diffs", threshold = .1, 
          var.order = "unadjusted")

## Using alternate variable names
v <- data.frame(old = c("age", "educ", "black", "hispan", 
                        "married", "nodegree", "re74", "re75"),
                new = c("Age", "Years of Education", "Black", 
                        "Hispanic", "Married", "No Degree", 
                        "Earnings 1974", "Earnings 1975"))
                
love.plot(bal.tab(m.out1), stat = "mean.diffs", threshold = .1, 
          var.order = "unadjusted", var.names = v)