perm_test: Permutation test for weight informativeness in survey regression

Description

Non-parametric test that permutes survey weights (optionally within blocks) to generate the null distribution of a chosen statistic. Supports fast closed-form WLS (linear case) via C++ and a pure R engine.

Usage

perm_test(
  model,
  stat = c("pred_mean", "coef_mahal"),
  B = 1000,
  coef_subset = NULL,
  block = NULL,
  normalize = TRUE,
  engine = c("C++", "R"),
  custom_fun = NULL,
  na.action = stats::na.omit
)
# S3 method for perm_test
print(x, ...)
# S3 method for perm_test
summary(object, ...)
# S3 method for perm_test
tidy(x, ...)
# S3 method for perm_test
glance(x, ...)

Value

An object of class "perm_test" with fields:

stat_obs: Observed statistic with actual weights
stat_null: Baseline statistic under equal weights (for centering)
perm_stats: Vector of permutation statistics
p.value: Permutation p-value (two-sided, centered at baseline)
effect: Observed minus median of permutation stats
stat: Statistic name
B: Number of permutations
call: Matched call
method: Description string

Arguments

model

An object of class svyglm (currently supports Gaussian family best).

stat

Statistic to use. Options:

"pred_mean": Compares the mean predicted outcome under weighted vs. unweighted regression. Simple, interpretable, and directly tied to differences in fitted population means. Sensitive to shifts in overall prediction levels caused by informative weights.
"coef_mahal": Computes the Mahalanobis distance between the weighted and unweighted coefficient vectors, using the unweighted precision matrix ($X'X$) as the metric. Captures joint shifts in regression coefficients, not just mean predictions. More powerful when informativeness manifests as changes in slopes or multiple coefficients simultaneously.

B

Number of permutations (e.g., 1000).

coef_subset

Optional character vector of coefficient names to include.

block

Optional factor for blockwise permutations (e.g., strata), permute within levels.

normalize

Logical; if TRUE (default), normalize weights to have mean 1.

engine

"C++" for fast WLS or "R" for pure R loop.

custom_fun

Optional function(model, X, y, wts) -> scalar statistic (overrides stat).

na.action

Function to handle missing data.

x

An object of class perm_test

...

Additional arguments passed to methods

object

An object of class perm_test

Details

This procedure implements a non‑parametric randomization test for the informativeness of survey weights. The null hypothesis is that, conditional on the covariates $X$, the survey weights $w$ are non‑informative with respect to the outcome $y$. Under this null, permuting the weights across observations should not change the distribution of any statistic that measures the effect of weighting.

The algorithm is:

Fit the unweighted regression $$\hat\beta_{U} = (X^\top X)^{-1} X^\top y$$ and the weighted regression $$\hat\beta_{W} = (X^\top W X)^{-1} X^\top W y,$$ where $W = \mathrm{diag}(w)$.
Compute the observed test statistic $T_{\mathrm{obs}}$:
- For "pred_mean": the difference in mean predicted outcomes between weighted and unweighted fits.
- For "coef_mahal": the Mahalanobis distance $$T = (\hat\beta_{W} - \hat\beta_{U})^\top (X^\top X)(\hat\beta_{W} - \hat\beta_{U}),$$ using the unweighted precision matrix as the metric.
- For a user‑supplied custom_fun, any scalar function of $(X,y,w)$.
Generate the null distribution by permuting the weights: $$w^{*(b)} = P_b w, \quad b=1,\ldots,B,$$ where each $P_b$ is a permutation matrix. If a block factor is supplied, permutations are restricted within block levels.
Recompute the test statistic $T^{*(b)}$ for each permuted weight vector. The empirical distribution of $T^{*(b)}$ represents the null distribution under non‑informative weights.
The two‑sided permutation p‑value is $$p = \frac{1 + \sum_{b=1}^B I\{|T^{*(b)} - T_0| \ge |T_{\mathrm{obs}} - T_0|\}} {B+1},$$ where $T_0$ is the baseline statistic under equal weights.

Intuitively, if the weights are informative, the observed statistic will lie in the tails of the permutation distribution, leading to a small p‑value. If the weights are non‑informative, shuffling them destroys any spurious association with the outcome, and the observed statistic is typical of the permutation distribution.

Examples

Run this code

# Load in survey package (required) and load in example data
library(survey)
data(api, package = "survey")

# Create a survey design and fit a weighted regression model
des <- svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat)
fit <- svyglm(api00 ~ ell + meals, design = des)

# Run permutation diagnostic test; reports permutation statistics with p-value
results <- perm_test(fit, stat = "pred_mean", B = 1000, engine = "R")
print(results)

Run the code above in your browser using DataLab