Generate a synthetic data set with qualitative outcomes under a regression discontinuity design. The data include a binary treatment indicator and a single covariate (the running variable). The conditional probability mass fuctions of potential outcomes are continuous in the running variable.
generate_qualitative_data_rd(n, outcome_type)A list storing a data frame with the observed data, and the true probabilities of shift at the cutoff.
Sample size.
String controlling the outcome type. Must be either "multinomial" or "ordered". Affects how potential outcomes are generated.
Riccardo Di Francesco
Potential outcomes are generated differently according to outcome_type. If outcome_type == "multinomial", generate_qualitative_data_rd computes linear predictors for each class using the covariates:
$$\eta_{mi} (d) = \beta_{m1}^d X_{i1} + \beta_{m2}^d X_{i2} + \beta_{m3}^d X_{i3}, \quad d = 0, 1,$$
and then transforms \(\eta_{mi} (d)\) into valid probability distributions using the softmax function:
$$P(Y_i(d) = m | X_i) = \frac{\exp(\eta_{mi} (d))}{\sum_{m'} \exp(\eta_{m'i}(d))}.$$
It then generates potential outcomes \(Y_i(1)\) and \(Y_i(0)\) by sampling from {1, 2, 3} using \(P(Y_i(d) = m | X_i), \, d = 0, 1\).
If instead outcome_type == "ordered", generate_qualitative_data_rd first generates latent potential outcomes:
$$Y_i^* (d) = \tau d + X_{i1} + X_{i2} + X_{i3} + N (0, 1), \quad d = 0, 1,$$
with \(\tau = 2\). It then constructs \(Y_i (d)\) by discretizing \(Y_i^* (d)\) using threshold parameters \(\zeta_1 = 2\) and \(\zeta_2 = 4\). Then,
$$P(Y_i(d) = m) = P(\zeta_{m-1} < Y_i^*(d) \leq \zeta_m) = \Phi (\zeta_m - \sum_j X_{ij} - \tau d) - \Phi (\zeta_{m-1} - \sum_j X_{ij} - \tau d), \quad d = 0, 1,$$
which allows us to analytically compute the probabilities of shift at the cutoff.
Treatment is always assigned as \(D_i = 1(X_i \geq 0.5)\).
The function always generates three independent covariates from \(U(0,1)\). Observed outcomes \(Y_i\) are always constructed using the usual observational rule.
generate_qualitative_data_soo generate_qualitative_data_iv generate_qualitative_data_did
## Generate synthetic data.
set.seed(1986)
data <- generate_qualitative_data_rd(100,
outcome_type = "ordered")
data$pshifts_cutoff
Run the code above in your browser using DataLab