covsearch: Search for Causal Effect Covariate Adjustment

Description

Find the witnesses and adjustment sets (if any) for the average causal effect (ACE) between a given treatment variable $X$ on a given outcome $Y$. This is done by an exhaustive search on a (reduced) set of possible candidates. Currently, only binary data is supported.

Usage

covsearch(problem, max_set = 12, min_only = TRUE, prior_ind = 0.5, prior_table = 10, cred_calc = FALSE, M = 1000, stop_at_first = FALSE, pop_solve = FALSE, verbose = FALSE)

Arguments

problem

a cfx problem instance for the ACE of a given treatment $X$ on a given outcome $Y$.

max_set

maximum size of conditioning set. The cost of the procedure grows exponentially as a function of this, so be careful when increasing the default value.

min_only

for each witness, once a set of a particular size is found, don't look for larger ones.

prior_ind

prior probability of an independence.

prior_table

effective sample size hyperparameter of a Dirichlet prior for testing independence with contingency tables.

cred_calc

if TRUE, compute conditional credible intervals for the ACE of highest scoring model.

if necessary to compute (conditional) credible intervals, use Monte Carlo with this number of samples.

stop_at_first

if TRUE, stop as soon as some witness is found.

pop_solve

if TRUE, assume we know the population graph in problem instead of data.

verbose

if TRUE, print out more detailed information while running the procedure.

Value

witness: array containing the indices of the witness variables.
Z: a list, where Z[[i]] is the $i$-th array containing the indices of the variables in the admissible set corresponding to witness witness[i].
witness_score: array containing the scores of each witness/admissible set pair.
hw: witness corresponding to the highest scoring pair.
hZ: array containing admissible set corresponding to the highest scoring pair.
ACEs: array of average causal effects corresponding to each witness/admissible pair.
ACEs_post: array of samples corresponding to the posterior distribution of the ACE associated implied by hW and hZ.

Details

The method assumes that the variables given in problem (other than problem$X_idx and problem$Y_idx) are covariates which causally precede treatment and outcome. It then applies the faithfulness condition of Spirtes, Glymour and Scheines (2000, Causation, Prediction and Search, MIT Press) to derive an admissible set: a set of covariates which removes all confounding between treatment and outcome when adjusted for. The necessary and sufficient conditions for finding an admissible set using the faithfulness assumption were discussed by Enter, Hoyer and Spirtes (2013, JMLR W&CP, vol. 31, 256--264). In order for a set to be proved an admissible set, some auxiliary variable in the covariate set is necessary - we call this variable a "witness." See Entner et al. for details. It is possible that no witness exists, which in this case the function returns an empty solution. Multiple witness/admissible sets might exist. The criterion for finding a witness/admissible set pair requires the testing of conditional independence constraints. The test is done by performing Bayesian model selection with a Dirichlet prior over the contingency table of the variables in problem using the effective sample size hyperparameter prior_table, and a prior probability of the independence hypothesis using the hyperparameter prior_ind.

For each witness/admissible set that passes this criterion, the function reports the posterior expected value of the implied ACE for each pair, by first plugging-in the posterior expected value of the contingency table as an estimate of the joint distribution. For a particular pair of witness/admissible set, chosen according to the best fit to the conditional independencies required by the criterion of Enter et al. (see also Silva and Evans, 2014, NIPS 298-306), we calculate the posterior distribution of the ACE. This posterior does not take into account the uncertainty on the choice of witness/admissible set, but instead is the conditional posterior given this choice.

The search for a witness/admissible set is by brute-force: for each witness, evaluate all subsets of the remaining covariates as candidate admissible sets. If there are too many covariates (more than max_set), only a filtered set of size max_set is considered for each witness. The set is chosen by first scoring each covariate by its empirical mutual information with the witness given problem$X_idx and picking the top max_set elements, to which a brute-force search is then applied.

References

http://jmlr.org/proceedings/papers/v31/entner13a.html

http://papers.nips.cc/paper/5602-causal-inference-through-a-witness-protection-program

Examples

Run this code

## Generate a synthetic problem
problem <- simulateWitnessModel(p = 4, q = 4, par_max = 3, M = 1000)

## Idealized case: suppose we know the true distribution,
## get "exact" ACE estimands for different adjustment sets
sol_pop <- covsearch(problem, pop_solve = TRUE)
effect_pop <- synthetizeCausalEffect(problem)
cat(sprintf(
  "ACE (true) = %1.2f\nACE (adjusting for all) = %1.2f\nACE (adjusting for nothing) = %1.2f\n",
   effect_pop$effect_real, effect_pop$effect_naive, effect_pop$effect_naive2))

## Perform inference and report results
covariate_hat <- covsearch(problem, cred_calc = TRUE, M = 1000)
summary(covariate_hat)

Run the code above in your browser using DataLab