simulate_data_step: Simulate data with linear confounding and causal effect following a step-function
Description
Simulation of data from a confounded non-linear model. Where the non-linear function is a random regression tree.
The data generating process is given by:
$$Y = f(X) + \delta^T H + \nu$$
$$X = \Gamma^T H + E$$
where \(f(X)\) is a random regression tree with \(m\) random splits of the data.
Resulting in a random step-function with \(m+1\) levels, i.e. leaf-levels.
$$f(x_i) = \sum_{k = 1}^K 1_{\{x_i \in R_k\}} c_k$$
\(E\), \(\nu\) are random error terms and
\(H \in \mathbb{R}^{n \times q}\) is a matrix of random confounding covariates.
\(\Gamma \in \mathbb{R}^{q \times p}\) and \(\delta \in \mathbb{R}^{q}\) are random coefficient vectors.
For the simulation, all the above parameters are drawn from a standard normal distribution, except for
\(\delta\) which is drawn from a normal distribution with standard deviation 10.
The leaf levels \(c_k\) are drawn from a uniform distribution between -50 and 50.
Usage
simulate_data_step(q, p, n, m, make_tree = FALSE)
Value
a list containing the simulated data:
X
a matrix of covariates
Y
a vector of responses
f_X
a vector of the true function f(X)
j
the indices of the causal covariates in X
tree
If make_tree, the random regression tree of class
Node from Glur2023Data.tree:StructureSDModels
Arguments
q
number of confounding covariates in H
p
number of covariates in X
n
number of observations
m
number of covariates with a causal effect on Y
make_tree
Whether the random regression tree should be returned.