Learn R Programming

xtdml (version 0.1.11)

make_plpr_data: Simulated Data Frame

Description

Generates data from a partially linear regression model for panel data with fixed effects similar to DGP3 (highly nonlinear) in Clarke and Polselli (2025).

The data generating process is defined as

\(Y_{it} = \theta D_{it} + g_0(X_{it}) + \alpha_i + U_{it},\) \(D_{it} = m_0(X_{it}) + \gamma_i + V_{it},\)

where \(U_{it} \sim \mathcal{N}(0,1)\), \(V_{it} \sim \mathcal{N}(0,1)\), \(\alpha_i = \rho A_i + \sqrt{1-\rho^2} B_i\) with \(A_i\sim \mathcal{N}(3,3)\), \(B_i\sim \mathcal{N}(0,1)\), and \(\gamma_i\sim \mathcal{N}(0,5)\).

The covariates are distributed as \(X_{it,p} \sim A_i + \mathcal{N}(0, 5)\), where \(p\) is the number of covariates.

The nuisance functions are given by

\(m_0(X_{it}) = a_1 [X_{it,1} \times 1(X_{it,1}>0)] + a_2 [X_{it,1} \times X_{it,3}],\) \(g_0(X_{it}) = b_1 [X_{it,1} \times X_{it,3}] + b_2 [X_{it,3} \times 1(X_{it,3}>0)],\)

with \(a_1=b_2=0.25\) and \(a_2=b_1=0.5\).

Usage

make_plpr_data(n_obs = 500, t_per = 10, dim_x = 20, theta = 0.5, rho = 0.8)

Value

A data object.

Arguments

n_obs

(integer(1))
The number of cross-sectional observations (i) to simulate.

t_per

(integer(1))
The number of time periods (t) to simulate.

dim_x

(integer(1))
The number of covariates.

theta

(numeric(1))
The value of the causal parameter.

rho

(numeric(1))
Parameter governing the relationship between the covariates and the unobserved individual heterogeneity. The value is chosen between 0 (pure random effect) and 1 (pure fixed effects).

References

Clarke, P. S. and Polselli, A. (2025). Double Machine Learning for Static Panel Models with Fixed Effects. Econometrics Journal. DOI: 10.1093/ectj/utaf011.

Examples

Run this code
df = make_plpr_data(n_obs = 500, t_per = 10, dim_x = 20, theta = 0.5, rho=0.8)

Run the code above in your browser using DataLab