Generates data from a partially linear regression model for panel data with fixed effects similar to DGP3 (highly nonlinear) in Clarke and Polselli (2025).
The data generating process is defined as
\(Y_{it} = \theta D_{it} + g_0(X_{it}) + \alpha_i + U_{it},\) \(D_{it} = m_0(X_{it}) + \gamma_i + V_{it},\)
where \(U_{it} \sim \mathcal{N}(0,1)\), \(V_{it} \sim \mathcal{N}(0,1)\), \(\alpha_i = \rho A_i + \sqrt{1-\rho^2} B_i\) with \(A_i\sim \mathcal{N}(3,3)\), \(B_i\sim \mathcal{N}(0,1)\), and \(\gamma_i\sim \mathcal{N}(0,5)\).
The covariates are distributed as \(X_{it,p} \sim A_i + \mathcal{N}(0, 5)\), where \(p\) is the number of covariates.
The nuisance functions are given by
\(m_0(X_{it}) = a_1 [X_{it,1} \times 1(X_{it,1}>0)] + a_2 [X_{it,1} \times X_{it,3}],\) \(g_0(X_{it}) = b_1 [X_{it,1} \times X_{it,3}] + b_2 [X_{it,3} \times 1(X_{it,3}>0)],\)
with \(a_1=b_2=0.25\) and \(a_2=b_1=0.5\).
make_plpr_data(n_obs = 500, t_per = 10, dim_x = 20, theta = 0.5, rho = 0.8)A data object.
(integer(1))
The number of cross-sectional observations (i) to simulate.
(integer(1))
The number of time periods (t) to simulate.
(integer(1))
The number of covariates.
(numeric(1))
The value of the causal parameter.
(numeric(1))
Parameter governing the relationship between the covariates and the unobserved
individual heterogeneity. The value is chosen between 0 (pure random effect)
and 1 (pure fixed effects).
Clarke, P. S. and Polselli, A. (2025). Double Machine Learning for Static Panel Models with Fixed Effects. Econometrics Journal. DOI: 10.1093/ectj/utaf011.
df = make_plpr_data(n_obs = 500, t_per = 10, dim_x = 20, theta = 0.5, rho=0.8)
Run the code above in your browser using DataLab