The data generating process is defined as:
make_ssm_data(
n_obs = 8000,
dim_x = 100,
theta = 1,
mar = TRUE,
return_type = "DoubleMLData"
)Depending on the return_type, returns an object or set of objects as specified.
(integer(1))
The number of observations to simulate.
(integer(1))
The number of covariates.
(numeric(1))
The value of the causal parameter.
(logical(1))
Indicates whether missingness at random holds.
(character(1))
If "DoubleMLData", returns a DoubleMLData object.
If "data.frame" returns a data.frame().
If "data.table" returns a data.table().
Default is "DoubleMLData".
$$ y_i = \theta d_i + x_i' \beta + u_i,$$
$$s_i = 1\lbrace d_i + \gamma z_i + x_i' \beta + v_i > 0 \rbrace,$$
$$d_i = 1\lbrace x_i' \beta + w_i > 0 \rbrace,$$
with \(y_i\) being observed if \(s_i = 1\) and covariates \(x_i \sim \mathcal{N}(0, \Sigma^2_x)\), where
\(\Sigma^2_x\) is a matrix with entries
\(\Sigma_{kj} = 0.5^{|j-k|}\).
\(\beta\) is a dim_x-vector with entries \(\beta_j=\frac{0.4}{j^2}\)
\(z_i \sim \mathcal{N}(0, 1)\),
\((u_i,v_i) \sim \mathcal{N}(0, \Sigma^2_{u,v})\),
\(w_i \sim \mathcal{N}(0, 1)\).
The data generating process is inspired by a process used in the simulation study (see Appendix E) of Bia, Huber and Lafférs (2023).
Michela Bia, Martin Huber & Lukáš Lafférs (2023) Double Machine Learning for Sample Selection Models, Journal of Business & Economic Statistics, DOI: 10.1080/07350015.2023.2271071