Estimates bounds on average treatment effects in linear IV models under limited violations of the exclusion criterion.
leakyIV(
dat,
tau,
p = 2,
normalize = TRUE,
method = "mle",
approx = TRUE,
n_boot = NULL,
bayes = FALSE,
parallel = TRUE,
...
)
A data frame with columns for ATE_lo
and ATE_hi
, representing
lower and upper bounds of the partial identification interval for the
causal effect of \(X\) on \(Y\). When bootstrapping, the output data
frame contains n_boot
rows, one for each bootstrap replicate.
Input data. Either (a) an \(n \times d\) data frame or matrix of observations with columns for treatment, outcome, and candidate instruments; or (b) a \(d \times d\) covariance matrix over such variables. The latter is incompatible with bootstrapping. Note that in either case, the order of variables is presumed to be treatment (\(X\)), outcome (\(Y\)), leaky instruments (\(Z\)).
Either (a) a scalar representing the upper bound on the p-norm of linear weights on \(Z\) in the structural equation for \(Y\); or (b) a vector representing upper bounds on the absolute value of each such coefficient. See details.
Power of the norm for the tau
threshold.
Scale candidate instruments to unit variance?
Estimator for the covariance matrix, if one is not supplied by
dat
. Options include (a) "mle"
, the default; (b) "shrink"
,
an analytic empirical Bayes solution; or (c) "glasso"
, the graphical
lasso. See details.
Use nearest positive definite approximation if the estimated covariance matrix is singular? See details.
Optional number of bootstrap replicates.
Use Bayesian bootstrap?
Compute bootstrap estimates in parallel? Must register
backend beforehand, e.g. via doParallel
.
Extra arguments to be passed to graphical lasso estimator if
method = "glasso"
. Note that the regularization parameter rho
is required as input, with no default.
Instrumental variables are defined by three structural assumptions: they must
be (A1) relevant, i.e. associated with the treatment; (A2)
unconfounded, i.e. independent of common causes between treatment and
outcome; and (A3) exclusive, i.e. only affect outcomes through the
treatment. The leakyIV
algorithm (Watson et al., 2024) relaxes (A3),
allowing some information leakage from IVs \(Z\) to outcomes \(Y\) in
linear systems. While the average treatment effect (ATE) is no longer
identifiable in this setting, sharp bounds can be computed exactly.
We assume the following structural equation for the treatment:
\(X := Z \beta + \epsilon_X\), where the final summand is a noise term that
correlates with the additive noise in the structural equation for the outcome:
\(Y := Z \gamma + X \theta + \epsilon_Y\). The ATE is given by the
parameter \(\theta\). Whereas classical IV models require each \(\gamma\)
coefficient to be zero, we permit some direct signal from \(Z\) to
\(Y\). Specifically, leakyIV
provides support for two types of
information leakage: (a) thresholding the p-norm of linear weights
\(\gamma\) (scalar tau
); and (b) thresholding the absolute value of
each \(\gamma\) coefficient one by one (vector tau
).
Numerous methods exist for estimating covariance matrices. leakyIV
provides support for maximum likelihood estimation (the default), as well as
empirical Bayes shrinkage via corpcor::cov.shrink
(Schäfer & Strimmer, 2005) and the graphical lasso via
glasso::glasso
(Friedman et al., 2007). These latter
methods are preferable in high-dimensional settings where sample covariance
matrices may be unstable or singular. Alternatively, users can pass a
pre-computed covariance matrix directly as dat
.
Estimated covariance matrices may be singular for some datasets or bootstrap
samples. Behavior in this case is determined by the approx
argument.
If TRUE
, leakyIV
proceeds with the nearest positive definite
approximation, computed via Higham's (2002) algorithm (with a warning). If
FALSE
, bounds are NA (also with a warning).
Uncertainty can be evaluated in leaky IV models using the bootstrap, provided
that covariances are estimated internally and not passed directly.
Bootstrapping provides a nonparametric sampling distribution for min/max
values of the ATE. Set bayes = TRUE
to replace the classical bootstrap
with a Bayesian bootstrap for approximate posterior inference (Rubin, 1981).
Watson, D., Penn, J., Gunderson, L., Bravo-Hermsdorff, G., Mastouri, A., and Silva, R. (2024). Bounding causal effects with leaky instruments. arXiv preprint, 2404.04446.
Friedman, J., Hastie, T., and Tibshirani, R. (2007). Sparse inverse covariance estimation with the lasso. Biostatistics, 9:432-441.
Schäfer, J., and Strimmer, K. (2005). A shrinkage approach to large-scale covariance estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol., 4:32.
Higham, N. (2002). Computing the nearest correlation matrix: A problem from finance. IMA J. Numer. Anal., 22:329–343.
Rubin, D.R. (1981). The Bayesian bootstrap. Ann. Statist., 9(1): 130-134.
set.seed(123)
# Hyperparameters
n <- 200
d_z <- 4
beta <- rep(1, d_z)
gamma <- rep(0.1, d_z)
theta <- 2
rho <- 0.5
# Simulate correlated residuals
S_eps <- matrix(c(1, rho, rho, 1), ncol = 2)
eps <- matrix(rnorm(n * 2), ncol = 2)
eps <- eps %*% chol(S_eps)
# Simulate observables from a leaky IV model
z <- matrix(rnorm(n * d_z), ncol = d_z)
x <- z %*% beta + eps[, 1]
y <- z %*% gamma + x * theta + eps[, 2]
obs <- cbind(x, y, z)
# Run the algorithm
leakyIV(obs, tau = 1)
# With bootstrapping
leakyIV(obs, tau = 1, n_boot = 10)
# With covariance matrix input
S <- cov(obs)
leakyIV(S, tau = 1)
Run the code above in your browser using DataLab