The data generating process is defined as:
make_ssm_data(
n_obs = 8000,
dim_x = 100,
theta = 1,
mar = TRUE,
return_type = "DoubleMLData"
)
Depending on the return_type
, returns an object or set of objects as specified.
(integer(1)
)
The number of observations to simulate.
(integer(1)
)
The number of covariates.
(numeric(1)
)
The value of the causal parameter.
(logical(1)
)
Indicates whether missingness at random holds.
(character(1)
)
If "DoubleMLData"
, returns a DoubleMLData
object.
If "data.frame"
returns a data.frame()
.
If "data.table"
returns a data.table()
.
Default is "DoubleMLData"
.
$$ y_i = \theta d_i + x_i' \beta + u_i,$$
$$s_i = 1\lbrace d_i + \gamma z_i + x_i' \beta + v_i > 0 \rbrace,$$
$$d_i = 1\lbrace x_i' \beta + w_i > 0 \rbrace,$$
with \(y_i\) being observed if \(s_i = 1\) and covariates \(x_i \sim \mathcal{N}(0, \Sigma^2_x)\), where
\(\Sigma^2_x\) is a matrix with entries
\(\Sigma_{kj} = 0.5^{|j-k|}\).
\(\beta\) is a dim_x
-vector with entries \(\beta_j=\frac{0.4}{j^2}\)
\(z_i \sim \mathcal{N}(0, 1)\),
\((u_i,v_i) \sim \mathcal{N}(0, \Sigma^2_{u,v})\),
\(w_i \sim \mathcal{N}(0, 1)\).
The data generating process is inspired by a process used in the simulation study (see Appendix E) of Bia, Huber and Lafférs (2023).
Michela Bia, Martin Huber & Lukáš Lafférs (2023) Double Machine Learning for Sample Selection Models, Journal of Business & Economic Statistics, DOI: 10.1080/07350015.2023.2271071