Learn R Programming

SDModels (version 2.0.2)

simulate_data_step: Simulate data with linear confounding and causal effect following a step-function

Description

Simulation of data from a confounded non-linear model. Where the non-linear function is a random regression tree. The data generating process is given by: $$Y = f(X) + \delta^T H + \nu$$ $$X = \Gamma^T H + E$$ where \(f(X)\) is a random regression tree with \(m\) random splits of the data. Resulting in a random step-function with \(m+1\) levels, i.e. leaf-levels. $$f(x_i) = \sum_{k = 1}^K 1_{\{x_i \in R_k\}} c_k$$ \(E\), \(\nu\) are random error terms and \(H \in \mathbb{R}^{n \times q}\) is a matrix of random confounding covariates. \(\Gamma \in \mathbb{R}^{q \times p}\) and \(\delta \in \mathbb{R}^{q}\) are random coefficient vectors. For the simulation, all the above parameters are drawn from a standard normal distribution, except for \(\delta\) which is drawn from a normal distribution with standard deviation 10. For a split a covariate is sampled uniformly and split at a random point using a beta distribution (with both shape parameters equal 2) on the support of the chosen covariate. The leaf levels \(c_k\) are drawn from a uniform distribution between \(cl\) and \(cu\).

Usage

simulate_data_step(q, p, n, m, make_tree = FALSE, cl = -50, cu = 50)

Value

a list containing the simulated data:

X

a matrix of covariates

Y

a vector of responses

f_X

a vector of the true function f(X)

j

the indices of the causal covariates in X

tree

If make_tree, the random regression tree of class SDTree

Arguments

q

number of confounding covariates in H

p

number of covariates in X

n

number of observations

m

number of splits done using a random covariate

make_tree

Whether the random regression tree should be returned.

cl

lower limit of the uniform distribution of the step levels

cu

upper limit of the uniform distribution of the step levels

Author

Markus Ulmer

References

See Also

simulate_data_nonlinear

Examples

Run this code
set.seed(42)
# simulation of confounded data
sim_data <- simulate_data_step(q = 2, p = 15, n = 100, m = 2, make_tree = TRUE)
X <- sim_data$X
Y <- sim_data$Y

all(predict(sim_data$tree, data.frame(X)) == sim_data$f_X)
plot(regPath(sim_data$tree))

Run the code above in your browser using DataLab