Learn R Programming

⚠️There's a newer version (1.4.5) of this package.Take me there.

CausalQueries

CausalQueries is a package that lets you declare binary causal models, update beliefs about causal types given data and calculate arbitrary estimands. Model definition makes use of dagitty functionality. Updating is implemented in stan.

See here for a guide to using CausalQueries along with many examples of causal models

Installation

To install CausalQueries:

install.packages("remotes")
remotes::install_github("macartan/CausalQueries")

Causal models

Causal models are defined by:

  • A directed acyclic graph (DAG), which provides the set of (binary) variables, a causal ordering between them, and a set of assumptions regarding conditional independence. If there is no arrow from A to B then a change in A never induces a change in B.
  • Functional forms. In this binary world we do not need functional forms in the usual sense. The DAG implies a set of "causal types." Units are classed together as of the same causal type if they respond to the same way to other variables. For instance, a type might be the set of units for which X=1 and for which Y=1 if and only if X=1. The set of causal types grows rapidly with the number of nodes and the number of nodes pointing into any given node. In this setting imposing functional forms is the same as placing restrictions on causal types: such restrictions reduce complexity but require substantive assumptions. An example of a restriction might be "Y is monotonic in X."
  • Priors. In the standard case, the DAG plus any restrictions imply a set of parameters that combine to form causal types. These are the parameters we want to learn about. To learn about them we first provide priors over the parameters. With priors specified the causal model is complete (it is a "probabilistic causal model") and we are ready for inference.

A wrinkle:

  • It is possible that nodes are related in ways not captured by the DAG. In such cases dotted curves are sometimes placed between nodes on a graph. It is possible to specify such possible unobservable confounding in the causal model. This has implications for the parameter space.

Inference

Our goal is to form beliefs over parameters but also over more substantive estimands:

  • With a causal model in hand and data available about some or all of the nodes, it is possible to make use of a generic stan model that generates posteriors over the parameter vector.

  • Given updated (or prior) beliefs about parameters it is possible to calculate causal estimands of inference from a causal model. For example "What is the probably that X was the cause of Y given X=1, Y=1 and Z=1."

Example

Here is an example of a model in which X causes M and M causes Y. There is, in addition, unobservable confounding between X and Y. This is an example of a model in which you might use information on M to figure out whether X caused Y.

The DAG is defined using dagitty syntax like this:

model <- make_model("X -> M -> Y")

To add the confounding we have to allow an additional parameter that allows a possibly different assignment probability for X given a causal type for Y.

model <- set_confound(model, list(X = "Y[X=1] == 1"))

We then set priors thus:

model <- set_priors(model, distribution = "jeffreys")

You can plot the dag, making use of functions in the dagitty package.

plot(model)

You can draw data from the model, like this:

data <- make_data(model, n = 10)

Updating is done like this:

updated_model <- update_model(model, data)

Finally you can calculate an estimand of interest like this:

CoE <- query_distribution(
                   model = updated_model, 
                   using = "posteriors",
                   query = "Y[X=0] == 0",
                   subset = "X==1 & Y==1"
                   )

This uses the posterior distribution and the model to assess the "causes of effects" estimand: the probability that X=1 was the cause of Y=1 in those cases in which X=1 and Y=1. The approach is to imagine a set of "do" operations on the model, that control the level of X and to inquire about the level of Y given these operations, and then to assess how likely is is that Y would be 0 if X were fixed at 0 within a set that naturally take on particular values of X and Y. By the same token this posterior can be calculated conditional on observations of M, allowing an assessment of how data on mediators alters inference about the causes of effects.

Credits etc

The approach used in CausalQueries is a generalization of the biqq models described in "Mixing Methods: A Bayesian Approach" (Humphreys and Jacobs, 2015, https://doi.org/10.1017/S0003055415000453). The conceptual extension makes use of work on probabilistic causal models described in Pearl's Causality (Pearl, 2009, https://doi.org/10.1017/CBO9780511803161). The approach to generating a generic stan function that can take data from arbitrary models was developed in key contributions by Jasper Cooper (http://jasper-cooper.com/) and Georgiy Syunyaev (http://gsyunyaev.com/). Lily Medina (https://lilymedina.github.io/) did the magical work of pulling it all together and developing approaches to characterizing confounding and defining estimands. Julio Solis has done wonders to simplify the specification of priors.

Copy Link

Version

Install

install.packages('CausalQueries')

Monthly Downloads

1,351

Version

0.0.3

License

MIT + file LICENSE

Maintainer

Lily Medina

Last Published

June 3rd, 2020

Functions in CausalQueries (0.0.3)

CausalQueries-package

'CausalQueries'
expand_data

Expand compact data object to data frame
drop_empty_families

Drop empty families
expand_wildcard

Expand wildcard
add_dots

Helper to fill in missing do operators in causal expression
add_wildcard

Adds a wildcard for every missing parent
get_parameter_matrix

Get parameter matrix
CausalQueries_internal_inherit_params

Create parameter documentation to inherit
default_stan_control

default_stan_control
get_parameter_names

Get parameter names
clean_param_vector

Clean parameter vector
check_string_input

Check string_input
gsub_many

Recursive substitution
get_event_prob

Draw event probabilities
collapse_data

Make compact data with data strategies
make_model

Make a model
clean_params

Check parameters sum to 1 in paramset; normalize if needed; add names if needed
collapse_nodal_types

collapse nodal types
get_parameters

Get parameters
make_data

Make data
get_nodal_types

Get list of types for nodes in a DAG
n_check

n_check
make_ambiguities_matrix

Make ambiguities matrix
get_param_dist

Get a distribution of model parameters
make_events

Make data in compact form
clean_condition

Clean condition
democracy_data

Democracy Data
make_priors

Make Priors
get_data_families

get_data_families
complements

Make statement for complements
set_prior_distribution

Add prior distribution draws
get_parents

Get list of parents of all nodes in a model
make_confounds_df

Make a confounds dataframe
get_type_prob

Get type probabilities
get_type_prob_multiple

Draw matrix of type probabilities, before or after estimation
includes_var

Whether a query contains an exact string
make_parameters

Make a 'true' parameter vector
interacts

Make statement for any interaction
make_data_single

Generate full dataset
increasing

Make monotonicity statement (positive)
set_ambiguities_matrix

Set ambiguity matrix
continue_names

Continue names
restrict_by_query

Reduce nodal types
plot_dag

Plot your dag using 'dagitty'
query_distribution

Calculate query distribution
all_data_types

All data types
minimal_data

Creates a data frame for case with no data
set_confound

Set confound
minimal_event_data

Creates a compact data frame for case with no data
prep_stan_data

Prepare data for 'stan'
query_model

Generate estimands dataframe
set_confounds

Set confounds
set_restrictions

Restrict a model
set_confounds_df

Set a confounds_df
set_sampling_args

set_sampling_args From 'rstanarm' (November 1st, 2019)
causal_type_names

Names for causal types
perm

Produces the possible permutations of a set of nodes
make_prior_distribution

Make a prior distribution from priors
reveal_outcomes

Reveal outcomes
non_decreasing

Make monotonicity statement (non negative)
set_priors

Set prior distribution
make_values_task_list

Make values task list
set_parameter_matrix

Set parameter matrix
nodes_in_statement

Identify nodes in a statement
data_type_names

Data type names
get_query_types

Look up query types
non_increasing

Make monotonicity statement (non positive)
get_type_names

Get type names
set_parameters

Set parameters
substitutes

Make statement for substitutes
unpack_wildcard

Unpack a wild card
te

Make treatment effect statement (positive)
get_causal_types

Get causal types
decreasing

Make monotonicity statement (negative)
get_ambiguities_matrix

Get ambiguities matrix
get_prior_distribution

Get a prior distribution from priors
get_priors

Get priors
interpret_type

Interpret or find position in nodal type
update_causal_types

Update causal types based on nodal types
list_non_parents

Returns a list with the nodes that are not directly pointing into a node
simulate_data

simulate_data is an alias for make_data
observe_data

Observe data, given a strategy
make_parameter_matrix

Make parameter matrix
update_model

Fit causal model using 'stan'
make_par_values_multiple

Make par values multiple
st_within

Get string between two regular expression patterns
make_par_values

make_par_values
query_to_expression

Helper to turn query into a data expression
translate_dagitty

Puts your DAG into 'dagitty' syntax (useful for using their plotting functions)
restrict_by_labels

Reduce nodal types using labels
make_nodal_types

Make nodal types
type_matrix

Generate type matrix
var_in_query

List of nodes contained in query
expand_nodal_expression

Helper to expand nodal expression