Performs a Monte Carlo test against the null hypothesis that minimum leakage is zero, a necessary but insufficient condition for exclusion.
exclusion_test(
dat,
normalize = TRUE,
method = "mle",
approx = TRUE,
n_sim = 1999L,
parallel = TRUE,
return_stats = FALSE,
...
)
Either a scalar representing the Monte Carlo p-value of the exclusion
test (default) or, if return_stats = TRUE
, a named list with three
entries: psi
, the observed statistic; psi0
, a vector of length
n_sim
with simulated null statistics; and p_value
, the
resulting p-value.
Input data. Either (a) an \(n \times d\) data frame or matrix of
observations with columns for treatment, outcome, and candidate instruments;
or (b) a \(d \times d\) covariance matrix over such variables. Note that
in either case, the order of variables is presumed to be treatment
(\(X\)), outcome (\(Y\)), leaky instruments (\(Z\)).
exclusion_test
requires at least two candidate instruments \(Z\).
Scale candidate instruments to unit variance?
Estimator for the covariance matrix. Options include
(a) "mle"
, the default; (b) "shrink"
, an analytic empirical
Bayes solution; or (c) "glasso"
, the graphical lasso. See details.
Use nearest positive definite approximation if the estimated covariance matrix is singular? See details.
Number of Monte Carlo replicates.
Run Monte Carlo simulations in parallel? Must register
backend beforehand, e.g. via doParallel
.
Return observed statistic and simulated null distribution?
Extra arguments to be passed to graphical lasso estimator if
method = "glasso"
. Note that the regularization parameter rho
is required as input, with no default.
The classic linear instrumental variable (IV) model relies on the exclusion criterion, which states that instruments \(Z\) have no direct effect on the outcome \(Y\), but can only influence it through the treatment \(X\). This implies a series of tetrad constraints that can be directly tested, given a model for sampling data from the covariance matrix of the observable variables (Watson et al., 2024).
We assume that data are multivariate normal and impose the null hypothesis by modifying the estimated covariance matrix to induce a linear dependence between the vectors for Cov(\(Z, X\)) and Cov(\(Z, Y\)). Our test statistic is the determinant of the cross product of these vectors, which equals zero if and only if the null hypothesis is true. We generate a null distribution by simulating from the null covariance matrix and compute a p-value by estimating the proportion of statistics that exceed the observed value. Future releases will provide support for a wider range of data generating processes.
Numerous methods exist for estimating covariance matrices. exclusion_test
provides support for maximum likelihood estimation (the default), as well as
empirical Bayes shrinkage via corpcor::cov.shrink
(Schäfer & Strimmer, 2005) and the graphical lasso via
glasso::glasso
(Friedman et al., 2007). These latter
methods are preferable in high-dimensional settings where sample covariance
matrices may be unstable or singular. Alternatively, users can pass a
pre-computed covariance matrix directly as dat
.
Estimated covariance matrices may be singular for some datasets or Monte
Carlo samples. Behavior in this case is determined by the approx
argument. If TRUE
, the test proceeds with the nearest positive
definite approximation, computed via Higham's (2002) algorithm (with a
warning). If FALSE
, the sampler will attempt to use the singular
covariance matrix (also with a warning), but results may be invalid.
Watson, D., Penn, J., Gunderson, L., Bravo-Hermsdorff, G., Mastouri, A., and Silva, R. (2024). Bounding causal effects with leaky instruments. arXiv preprint, 2404.04446.
Spirtes, P. Calculation of entailed rank constraints in partially non-linear and cyclic models. In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence, 606–615, 2013.
Friedman, J., Hastie, T., and Tibshirani, R. (2007). Sparse inverse covariance estimation with the lasso. Biostatistics, 9:432-441.
Schäfer, J., and Strimmer, K. (2005). A shrinkage approach to large-scale covariance estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol., 4:32.
Higham, N. (2002). Computing the nearest correlation matrix: A problem from finance. IMA J. Numer. Anal., 22:329–343.
set.seed(123)
# Hyperparameters
n <- 200
d_z <- 4
beta <- rep(1, d_z)
theta <- 2
rho <- 0.5
# Simulate correlated residuals
S_eps <- matrix(c(1, rho, rho, 1), ncol = 2)
eps <- matrix(rnorm(n * 2), ncol = 2)
eps <- eps %*% chol(S_eps)
# Simulate observables from the linear IV model
z <- matrix(rnorm(n * d_z), ncol = d_z)
x <- z %*% beta + eps[, 1]
y <- x * theta + eps[, 2]
obs <- cbind(x, y, z)
# Compute p-value of the test
exclusion_test(obs, parallel = FALSE)
Run the code above in your browser using DataLab