Non-parametric test that permutes survey weights (optionally within blocks) to generate the null distribution of a chosen statistic. Supports fast closed-form WLS (linear case) via C++ and a pure R engine.
perm_test(
model,
stat = c("pred_mean", "coef_mahal"),
B = 1000,
coef_subset = NULL,
block = NULL,
normalize = TRUE,
engine = c("C++", "R"),
custom_fun = NULL,
na.action = stats::na.omit
)# S3 method for perm_test
print(x, ...)
# S3 method for perm_test
summary(object, ...)
# S3 method for perm_test
tidy(x, ...)
# S3 method for perm_test
glance(x, ...)
An object of class "perm_test" with fields:
Observed statistic with actual weights
Baseline statistic under equal weights (for centering)
Vector of permutation statistics
Permutation p-value (two-sided, centered at baseline)
Observed minus median of permutation stats
Statistic name
Number of permutations
Matched call
Description string
An object of class svyglm (currently supports Gaussian family best).
Statistic to use. Options:
"pred_mean":
Compares the mean predicted outcome under weighted vs. unweighted regression.
Simple, interpretable, and directly tied to differences in fitted population means.
Sensitive to shifts in overall prediction levels caused by informative weights.
"coef_mahal":
Computes the Mahalanobis distance between the weighted and unweighted coefficient vectors,
using the unweighted precision matrix (\(X'X\)) as the metric. Captures joint shifts in regression coefficients,
not just mean predictions. More powerful when informativeness manifests as changes in slopes
or multiple coefficients simultaneously.
Number of permutations (e.g., 1000).
Optional character vector of coefficient names to include.
Optional factor for blockwise permutations (e.g., strata), permute within levels.
Logical; if TRUE (default), normalize weights to have mean 1.
"C++" for fast WLS or "R" for pure R loop.
Optional function(model, X, y, wts) -> scalar statistic (overrides stat).
Function to handle missing data.
An object of class perm_test
Additional arguments passed to methods
An object of class perm_test
This procedure implements a non‑parametric randomization test for the informativeness of survey weights. The null hypothesis is that, conditional on the covariates \(X\), the survey weights \(w\) are non‑informative with respect to the outcome \(y\). Under this null, permuting the weights across observations should not change the distribution of any statistic that measures the effect of weighting.
The algorithm is:
Fit the unweighted regression $$\hat\beta_{U} = (X^\top X)^{-1} X^\top y$$ and the weighted regression $$\hat\beta_{W} = (X^\top W X)^{-1} X^\top W y,$$ where \(W = \mathrm{diag}(w)\).
Compute the observed test statistic \(T_{\mathrm{obs}}\):
For "pred_mean": the difference in mean predicted outcomes
between weighted and unweighted fits.
For "coef_mahal": the Mahalanobis distance
$$T = (\hat\beta_{W} - \hat\beta_{U})^\top
(X^\top X)(\hat\beta_{W} - \hat\beta_{U}),$$
using the unweighted precision matrix as the metric.
For a user‑supplied custom_fun, any scalar function of
\((X,y,w)\).
Generate the null distribution by permuting the weights:
$$w^{*(b)} = P_b w, \quad b=1,\ldots,B,$$
where each \(P_b\) is a permutation matrix. If a block factor
is supplied, permutations are restricted within block levels.
Recompute the test statistic \(T^{*(b)}\) for each permuted weight vector. The empirical distribution of \(T^{*(b)}\) represents the null distribution under non‑informative weights.
The two‑sided permutation p‑value is $$p = \frac{1 + \sum_{b=1}^B I\{|T^{*(b)} - T_0| \ge |T_{\mathrm{obs}} - T_0|\}} {B+1},$$ where \(T_0\) is the baseline statistic under equal weights.
Intuitively, if the weights are informative, the observed statistic will lie in the tails of the permutation distribution, leading to a small p‑value. If the weights are non‑informative, shuffling them destroys any spurious association with the outcome, and the observed statistic is typical of the permutation distribution.
# Load in survey package (required) and load in example data
library(survey)
data(api, package = "survey")
# Create a survey design and fit a weighted regression model
des <- svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat)
fit <- svyglm(api00 ~ ell + meals, design = des)
# Run permutation diagnostic test; reports permutation statistics with p-value
results <- perm_test(fit, stat = "pred_mean", B = 1000, engine = "R")
print(results)
Run the code above in your browser using DataLab