Learn R Programming

CausalFX (version 1.0.1)

cfx: Creates a CausalFX Problem Instance

Description

Set up an object describing a causal inference problem of finding the average causal effect of some treatment on some outcome. Currently, only binary data is supported. The problem specification also allows the specification of a synthetic model, for simulation studies.

Usage

cfx(x, y, latent_idx = NULL, dat = NULL, g = NULL, model = NULL, num_v_max = 20)

Arguments

x
the index of the treatment variable.
y
the index of the outcome variable.
latent_idx
an array with the indices of variables which should be considered latent
dat
a matrix of binary data, can be ignored if a model is provided.
g
a binary matrix encoding a causal graph, where g[i, j] == 1 if a directed edge from vertex $j$ to $i$ should exist, 0 otherwise. This is only required if a ground truth model exists.
model
if g is specified, this needs to be specified too. This argument should be a list of conditional probability tables, each encoding the conditional probability of each vertex in g given its parents. Entry model[[i]] is an array of non-negative numbers, describing the probability of random variable/vertex $i$ being equal to 1. In particular, model[[i]][j] is the conditional probability of this event given that the parents of $i$ are in state $j$. States are indexed as follows. If $S$ is the binary string corresponding to the binary values of the parents of $i$ in g, sorted by their index, then $j$ is given by $1 + bin2dec(S)$, where $bin2dec$ is the transformation of a binary string into a decimal number.
num_v_max
the maximum dimensionality in which the joint distribution implied by a model is pre-computed. Having this pre-computed can speed up some computations for methods that use the provided ground truth model. Because the space required to store a joint distribution grows exponentially with the dimensionality, this quantity cannot be too large.

Value

A cfx object, which contains the following fields:
X_idx
the index of the treatment variable in the data/graph.
Y_idx
the index of the outcome variable in the data/graph.
latent_idx
the array of latent variable indices given as input.
data
the data given as input.
graph
the graph given as input.
varnames
an array of strings with the names of the variables, as given by data. If data has no column names or it is not provided, this is given a default value, where variable $i$ is assigned the name "X".
model
the model given as input.
ancestrals
a list of arrays (if g is provided), where ancestrals[[i]] is the array of the indices of the ancestrals of $i$ in g, excluding $i$ itself.
probs
a multidimensional array (if g is provided) of the same dimensionality as g, where each entry corresponds to the probability of that particular assignment of variable values. This is NULL if the dimensionality of g is greater than num_v_max.