wa_test: Weight-Association Tests for Survey Weights

Description

Implements several weight-association tests that examine whether survey weights are informative about the response variable after conditioning on covariates. Variants include DuMouchel-Duncan (DD), Pfeffermann-Sverchkov (PS1 and PS2, with optional quadratic terms or user-supplied auxiliary designs), and Wu-Fuller (WF).

Usage

wa_test(
  model,
  type = c("DD", "PS1", "PS1q", "PS2", "PS2q", "WF"),
  coef_subset = NULL,
  aux_design = NULL,
  na.action = stats::na.omit
)
# S3 method for wa_test
print(x, ...)
# S3 method for wa_test
summary(object, ...)
# S3 method for wa_test
tidy(x, ...)
# S3 method for wa_test
glance(x, ...)

Value

An object of class "wa_test" containing:

statistic: F-test statistic
parameter: Degrees of freedom (numerator, denominator)
p.value: P-value for the test
method: Name of the test performed
call: Function call

Arguments

model: An object of class svyglm.
type: Character string specifying the test type: "DD", "PS1", "PS1q", "PS2", "PS2q", "WF".
coef_subset: Optional character vector of coefficient names to include in the test. Defaults to all coefficients.
aux_design: Optional matrix or function to generate auxiliary regressors for PS1/PS2 tests. If a function, it should take X and y and return a matrix of extra columns to include.
na.action: Function to handle missing data before testing.
x: An object of class wa_test
...: Additional arguments passed to methods
object: An object of class wa_test

Details

Let $y$ denote the response, $X$ the design matrix of covariates, and $w$ the survey weights. The null hypothesis in all cases is that the weights are non-informative given $X$, i.e. they do not provide additional information about $y$ beyond the covariates.

The following test variants are implemented:

DuMouchel–Duncan (DD): After fitting the unweighted regression $$\hat\beta = (X^\top X)^{-1} X^\top y,$$ compute residuals $e = y - X\hat\beta$. The DD test regresses $e$ on the weights $w$: $$e = \gamma_0 + \gamma_1 w + u.$$ A significant $\gamma_1$ indicates association between weights and residuals, hence informativeness.
Pfeffermann–Sverchkov PS1: Augments the outcome regression with functions of the weights as auxiliary regressors: $$y = X\beta + f(w)\theta + \varepsilon.$$ Under the null, $\theta = 0$. Quadratic terms ($w^2$) can be included ("PS1q"), or the user may supply a custom auxiliary design matrix $f(w)$.
Pfeffermann–Sverchkov PS2: First regress the weights on the covariates, $$w = X\alpha + \eta,$$ and obtain fitted values $\hat w$. Then augment the outcome regression with $\hat w$ (and optionally $\hat w^2$ for "PS2q"): $$y = X\beta + g(\hat w)\theta + \varepsilon.$$ Again, $\theta = 0$ under the null.
Wu–Fuller (WF): Compares weighted and unweighted regression fits. Let $\hat\beta_W$ and $\hat\beta_U$ denote the weighted and unweighted estimators. The test statistic is based on $$T = (\hat\beta_W - \hat\beta_U)^\top \widehat{\mathrm{Var}}^{-1}(\hat\beta_W - \hat\beta_U) $$ and follows an approximate $F$ distribution. A large value indicates that weights materially affect the regression.

In all cases, the reported statistic is an $F$-test with numerator degrees of freedom equal to the number of auxiliary regressors added, and denominator degrees of freedom equal to the residual degrees of freedom from the augmented regression.

References

DuMouchel, W. H., & Duncan, G. J. (1983). Using sample survey weights in multiple regression analyses of stratified samples. *Journal of the American Statistical Association*, 78(383), 535-543.

Pfeffermann, D., & Sverchkov, M. (1999). Parametric and semi-parametric estimation of regression models fitted to survey data. *Sankhya: The Indian Journal of Statistics, Series B*, 61(1), 166-186.

Pfeffermann, D., & Sverchkov, M. (2003). Fitting generalized linear models under informative sampling. In R. L. Chambers & C. J. Skinner (Eds.), *Analysis of Survey Data* (pp. 175-196). Wiley.

Wu, Y., & Fuller, W. A. (2005). Preliminary testing procedures for regression with survey samples. In *Proceedings of the Joint Statistical Meetings, Survey Research Methods Section* (pp. 3683-3688). American Statistical Association.

Examples

Run this code

# Load in survey package (required) and load in example data
library(survey)
data(api, package = "survey")

# Create a survey design and fit a weighted regression model
des <- svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat)
fit <- svyglm(api00 ~ ell + meals, design = des)

# Run weight-association diagnostic test; reports F-stat, df's, and p-value
results <- wa_test(fit, type = "DD")
print(results)

Run the code above in your browser using DataLab