weightthem: Weights Multiply Imputed Datasets

Description

The weightthem() function enables parametric models for causal inference to work better by estimating weights of the control and treatment observations in each imputed dataset of a mids or amelia class object.

Usage

weightthem(formula, datasets, approach = "within", method = "ps",
  estimand = "ATE", stabilize = FALSE, focal = NULL, by = NULL,
  s.weights = NULL, ps = NULL, moments = 1, int = FALSE,
  verbose = FALSE, include.obj = FALSE, ...)

Arguments

formula

This argument takes the usual syntax of R formula, z ~ x1 + x2, where z is a binary treatment indicator and x1 and x2 are the potential confounders. Both the treatment indicator and the potential confounders must be contained in the imputed datasets, which is specified as datasets (see below). All of the usual R syntax for formula works. For example, x1:x2 represents the first order interaction term between x1 and x2 and I(x1^2) represents the square term of x1. See help(formula) for details.

datasets

This argument specifies the datasets containing the treatment indicator and the potential confounders called in the formula. This argument must be an object of the mids or amelia class, which is typically produced by a previous call to mice() or mice.mids() functions from the mice package or to amelia function from the Amelia package (the Amelia package is designed to impute missing data in a single cross-sectional dataset or in a time-series dataset, although it may work with the latter, currently, the MatchThem package only supports the former datasets).

approach

This argument specifies a matching approach. Currently, "within" (calculating distance measures within each imputed dataset and weighting observations based on them ) and "across" (calculating distance measures within each imputed dataset, averaging distance measure for each observation across imputed datasets, and weighting based on the averaged measures) approaches are available. The default is "within" which has been shown previously to produce unbiased results.

method

This argument specifies the method that will be used to estimate weights. Currently, "ps" (propensity score weighting using generalized linear models), "gbm" (propensity score weighting using generalized boosted modeling), "cbps" (covariate balancing propensity score weighting), "npcbps" (non-parametric covariate balancing propensity score weighting), "ebal" (entropy balancing), "ebcw" (empirical balancing calibration weighting), "optweight" (optimization-based weighting), "super" (propensity score weighting using SuperLearner), and "user-defined" (weighting using a user-defined weighting function) are available. The default is "ps". Note that within each of these weighting methods, MatchThem offers a variety of options.

estimand

This argument specifies the desired estimand. For binary and multinomial treatments, can be "ATE", "ATT", "ATC", and, for some weighting methods, "ATO" or "ATM". The default is "ATE". Please see the WeightIt package reference manual <https://cran.r-project.org/package=WeightIt> for more details.

stabilize

This argument specifies whether to stabilize the weights. For the methods that involve estimating propensity scores, this involves multiplying each observation weight by the sum of the weights in the observation group (control or treatment). The default is FALSE. Please see the WeightIt package reference manual <https://cran.r-project.org/package=WeightIt> for more details.

focal

This argument specifies which group to consider as the treatment or the focal group (when multinomial treatments are used and the "ATT" is requested). This group will not be weighted, and the other groups will be weighted to be more like the focal group. Please see the WeightIt package reference manual <https://cran.r-project.org/package=WeightIt> for more details.

This argument specifies a vector or the names of variables in the datasets, for which weighting should be done within categories. For example, if by = "gender", weights will be generated separately within each level of the variable "gender".

s.weights

This argument specifies a vector of sampling weights or the name of a variable in the datasets that contains sampling weights. These can also be matching weights if weighting is to be used on matched data.

This argument specifies a vector of propensity scores or the name of a variable in the datasets containing the propensity scores. If not NULL, weighting method is ignored, and the propensity scores will be used to create weights (in this case, formula must include the treatment indicator in the datasets, but the listed covariates will play no role in the weight estimation).

moments

This argument specifies the greatest moment of the covariate distribution to be balanced (for entropy balancing, empirical balancing calibration weights, and optimization-based weights). For example, if moments = 3, for all non-categorical covariates, the mean, second moment (variance), and third moments (skew) of the covariates will be balanced. This argument is ignored for other weighting methods; to balance powers of the covariates, appropriate functions must be entered in the formula. Please see the WeightIt package reference manual <https://cran.r-project.org/package=WeightIt> for more details.

int

This argument specifies whether first-order interactions of the covariates should be balanced (essentially balancing the covariances between covariates, for entropy balancing, empirical balancing calibration weights, and optimization-based weights). This argument is ignored for other weighting methods; to balance interactions between the variables, appropriate functions must be entered in the formula. The default is FALSE. Please see the WeightIt package reference manual <https://cran.r-project.org/package=WeightIt> for more details.

verbose

This argument specifies whether to print additional information output by the fitting function. The default is FALSE.

include.obj

This argument specifies whether to include in the output any fit objects created in the process of estimating the weights. For example, with method = "ps", the glm objects containing the propensity score model will be included. The default is FALSE. Please see the WeightIt package reference manual <https://cran.r-project.org/package=WeightIt> for more details.

...

Additional arguments to be passed to the matching method.

Value

This function returns an object of the wimids (weighted multiply imputed datasets) class, that includes weights of observations of the imputed datasets (listed as the weights variables in each) primarily passed to the function by the datasets argument.

Details

The weighting is done using the weightthem(z ~ x1, ...) command, where z is the treatment indicator and x1 represents the potential confunders to be used in the weighting model. The default syntax is weightthem(formula, datasets = NULL, method = "ps", ...). Summaries of the results can be seen numerically using summary() functions. The print() function also prints out the output.

References

Stef van Buuren and Karin Groothuis-Oudshoorn (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3): 1-67. https://www.jstatsoft.org/v45/i03/

Gary King, James Honaker, Anne Joseph, and Kenneth Scheve (2001). Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation. American Political Science Review, 95: 49<U+2013>69. http://j.mp/2oOrtGs

Examples

Run this code

# NOT RUN {
#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice(osteoarthritis, m = 5, maxit = 10,
                         method = c("", "", "mean", "polyreg", "logreg", "logreg", "logreg"))

#Estimating weights of observations in the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK, imputed.datasets,
                                approach = 'within', method = 'ps')
# }

Run the code above in your browser using DataLab