Generate a synthetic data set with qualitative outcomes under a difference-in-differences design. The data include two time periods, a binary treatment indicator (applied only in the second period), and a matrix of covariates. Probabilities time shift among the treated and control groups evolve similarly across the two time periods (parallel trends on the probability mass functions).
generate_qualitative_data_did(n, assignment, outcome_type)A list storing a data frame with the observed data, the true propensity score, and the true probabilities of shift on the treated.
Sample size.
String controlling treatment assignment. Must be either "randomized" (random assignment)
or "observational" (assignment based on covariates).
String controlling the outcome type. Must be either "multinomial" or "ordered".
Riccardo Di Francesco
Potential outcomes are generated differently according to outcome_type. If outcome_type == "multinomial", generate_qualitative_data_did computes linear predictors for each class using the covariates:
$$\eta_{mi} (d, s) = \beta_{m1}^d X_{i1} + \beta_{m2}^d X_{i2} + \beta_{m3}^d X_{i3}, \quad d = 0, 1, \quad s = t-1, t,$$
and then transforms \(\eta_{mi} (d, s)\) into valid probability distributions using the softmax function:
$$P(Y_{is}(d) = m | X_i) = \frac{\exp(\eta_{mi} (d, s))}{\sum_{m'} \exp(\eta_{m'i}(d, s))}, \quad d = 0, 1, \quad s = t-1, t.$$
It then generates potential outcomes \(Y_{it-1}(1)\), \(Y_{it}(1)\), \(Y_{it-1}(0)\), and \(Y_{it}(0)\) by sampling from {1, 2, 3} using \(P(Y(d, s) = m \mid X), \, d = 0, 1, \, s = t-1, t\).
If instead outcome_type == "ordered", generate_qualitative_data_did first generates latent potential outcomes:
$$Y_i^* (d, s) = \tau d + X_{i1} + X_{i2} + X_{i3} + N (0, 1), \quad d = 0, 1, \quad s = t-1, t,$$
with \(\tau = 2\). It then constructs \(Y_i (d, s)\) by discretizing \(Y_i^* (d, s)\) using threshold parameters \(\zeta_1 = 2\) and \(\zeta_2 = 4\). Then,
$$P(Y_i(d, s) = m | X_i) = P(\zeta_{m-1} < Y_i^*(d, s) \leq \zeta_m | X_i) = \Phi (\zeta_m - \sum_j X_{ij} - \tau d) - \Phi (\zeta_{m-1} - \sum_j X_{ij} - \tau d), \quad d = 0, 1, \quad s = t-1, t,$$
which allows us to analytically compute the probabilities of shift on the treated.
Treatment is always assigned as \(D_i \sim \text{Bernoulli}(\pi(X_i))\). If assignment == "randomized", then the propensity score is specified as \(\pi(X_i) = P ( D_i = 1 | X_i)) = 0.5\).
If instead assignment == "observational", then \(\pi(X_i) = (X_{i1} + X_{i3}) / 2\).
The function always generates three independent covariates from \(U(0,1)\). Observed outcomes \(Y_{is}\) are always constructed using the usual observational rule.
generate_qualitative_data_soo generate_qualitative_data_iv generate_qualitative_data_rd
## Generate synthetic data.
set.seed(1986)
data <- generate_qualitative_data_did(100,
assignment = "observational",
outcome_type = "ordered")
data$pshifts_treated
Run the code above in your browser using DataLab