diagnose_rerandomization: Diagnostic map from observed (or targeted) balance to precision and stringency

Description

Implements the calculations in Theorem 1 and Appendix D of the paper involving: (1) Realized RMSE from an observed Mahalanobis distance M (or SMDs); (2) Ex-ante RMSE when accepting assignments with M < a (equivalently, with acceptance probability q under complete randomization); (3) largest acceptance probability q that attains a user-specified precision goal, provided via an RMSE target or via a power target (alpha, 1-beta, |tau|).

Usage

diagnose_rerandomization(
  smd = NULL,
  M = NULL,
  d = NULL,
  n_T,
  n_C,
  sigma = NULL,
  R2 = NULL,
  rmse_goal = NULL,
  tau = NULL,
  alpha = 0.05,
  power = 0.8,
  two_sided = TRUE,
  q_min = 1e-09,
  q_tol = 1e-10
)

Value

A list of class `"fastrerandomize_diagnostic"` with elements:

inputs: Echo of parsed inputs and derived quantities (M, d, S=1/n_T+1/n_C).
realized: rmse_factor (dimensionless, per sigma), rmse, and conservative rmse_upper_factor, rmse_upper.
power_check: If tau, alpha, power, sigma, R2 are given, includes z_needed, z_realized, and already_sufficient.
recommendation: If a target is supplied (via rmse_goal or tau), returns q_star, a_star, v_star, rmse_exante, expected_M_accepted, and expected_draws_per_accept = 1/q_star.

Arguments

smd: Optional numeric vector of standardized mean differences; if supplied, M is computed as sum(smd^2), and d = length(smd).
M: Optional scalar Mahalanobis distance M; if provided without `smd`, you must also supply `d` (the number of covariates used in M).
d: Optional integer number of covariates (needed if supplying only `M`).
n_T: Integer, number of treated units.
n_C: Integer, number of control units.
sigma: Optional outcome noise SD (sigma). If `NULL`, absolute RMSEs cannot be formed; dimensionless "per-sigma" factors are still returned.
R2: Optional model R^2 for Y ~ X under the linear potential-outcomes model. Must lie in [0,1). If `NULL`, RMSEs that require R^2 are returned as NA, but the "per-sigma" formulas that do not need R^2 are still shown when possible.
rmse_goal: Optional numeric target for RMSE (same units as Y). If supplied (with sigma and R2), the largest q achieving this ex-ante goal is returned.
tau: Optional effect size |tau| (same units as Y) to back out an RMSE goal via a normal approximation to power.
alpha: Size of a two-sided test (default 0.05).
power: Desired power 1 - beta (default 0.80). Used only if `tau` is given.
two_sided: Logical; if FALSE, uses a one-sided z-threshold for power inversion.
q_min: Lower bound for numerical search over q (default 1e-9).
q_tol: Absolute tolerance for q root-finding (default 1e-10).

Details

Realized (conditional) RMSE: with standardized/whitened X and typical orientation, $$ \mathrm{RMSE}_{\text{realized}} \approx \sqrt{\;\sigma^2\!\left(\frac{1}{n_T}+\frac{1}{n_C}\right) + \frac{\sigma_{\text{Prog}}^2}{d}\, M\;} \;=\; \sigma\,\sqrt{\left(\frac{1}{n_T}+\frac{1}{n_C}\right) + \frac{R^2}{1-R^2}\,\frac{M}{d}}\,,$$ and the conservative upper bound replaces $\sigma_{\text{Prog}}^2/d$ by $\sigma_{\text{Prog}}^2$.

Ex-ante (design-stage) RMSE under thresholding: with acceptance rule M <= a (acceptance probability q), $$ \mathbb{E}[\mathrm{MSE}\mid M\le a] = \left(\frac{1}{n_T}+\frac{1}{n_C}\right)\!\left(\sigma^2 + v_a(d)\,\sigma_{\text{Prog}}^2\right),$$ where $v_a(d) = \Pr(\chi^2_{d+2}\le c)/\Pr(\chi^2_d\le c)$ and $c = a / (\tfrac{1}{n_T}+\tfrac{1}{n_C})$. Since $q = \Pr(\chi^2_d \le c)$, we can parameterize by q: $v(q;d) = \Pr(\chi^2_{d+2}\le \chi^2_{d;q})/q$, with $\chi^2_{d;q}$ the q-th quantile.

Power inversion (Appendix D): for two-sided size $\alpha$ and power $1-\beta$, a normal approximation suggests the RMSE goal $|\tau| / (z_{1-\alpha/2}+z_{1-\beta})$.

Examples

Run this code

# Example 1: observed SMDs, realized precision only (dimensionless factors)
smd <- c(0.10, -0.05, 0.08, 0.02)  # standardized mean differences
out1 <- diagnose_rerandomization(smd = smd, n_T = 100, n_C = 100)
print(out1)

# Example 2: same, but supply sigma and R^2 for absolute RMSE
out2 <- diagnose_rerandomization(smd = smd, n_T = 100, n_C = 100, sigma = 1.2, R2 = 0.4)

# Example 3: choose q to hit a power target (two-sided alpha=.05, 80% power, |tau|=0.2)
out3 <- diagnose_rerandomization(smd = smd, n_T = 100, n_C = 100, sigma = 1.2, R2 = 0.4,
                tau = 0.2, alpha = 0.05, power = 0.80)
                
# Analyze rerandomization recommendation given contextual factors
out3$recommendation

# Example 4: choose q to hit an absolute RMSE goal directly
out4 <- diagnose_rerandomization(M = sum(smd^2), d = length(smd), n_T = 100, n_C = 100,
                sigma = 1.2, R2 = 0.4, rmse_goal = 0.25)

Run the code above in your browser using DataLab