Estimate a grouped time-varying panel data model given an observed group structure. Coefficient functions are homogeneous within groups but heterogeneous across groups. The time-varying coefficients are modeled as polynomial B-splines. The function supports both static and dynamic panel data models.
grouped_tv_plm(
formula,
data,
groups,
index = NULL,
n_periods = NULL,
d = 3,
M = floor(length(y)^(1/7) - log(p)),
const_coef = NULL,
rho = 0.04 * log(N * n_periods)/sqrt(N * n_periods),
verbose = TRUE,
parallel = TRUE,
...
)# S3 method for tv_gplm
summary(object, ...)
# S3 method for tv_gplm
formula(x, ...)
# S3 method for tv_gplm
df.residual(object, ...)
# S3 method for tv_gplm
print(x, ...)
# S3 method for tv_gplm
coef(object, ...)
# S3 method for tv_gplm
residuals(object, ...)
# S3 method for tv_gplm
fitted(object, ...)
An object of class tv_gplm
holding
model
a data.frame
containing the dependent and explanatory variables as well as cross-sectional and time indices,
coefficients
let \(p^{(1)}\) denote the number of time-varying and \(p^{(2)}\) the number of time constant coefficients. A list
holding (i) a \(T \times p^{(1)} \times K\) array of the group-specific functional coefficients and (ii) a \(K \times p^{(2)}\) matrix of time-constant estimates.
groups
a list
containing (i) the total number of groups \(K\) and (ii) a vector of group memberships \((\hat{g}_1, \dots, \hat{g}_N)\), where \(\hat{g}_i = k\) if \(i\) is part of group \(k\),
residuals
a vector of residuals of the demeaned model,
fitted
a vector of fitted values of the demeaned model,
args
a list
of additional arguments,
IC
a list
containing (i) the value of the IC and (ii) the MSE,
call
the function call.
An object of class tv_gplm
has print
, summary
, fitted
, residuals
, formula
, df.residual
and coef
S3 methods.
a formula object describing the model to be estimated.
a data.frame
or matrix
holding a panel data set. If no index
variables are provided, the panel must be balanced and ordered in the long format \(\bold{Y}=(Y_1^\prime, \dots, Y_N^\prime)^\prime\), \(Y_i = (Y_{i1}, \dots, Y_{iT})^\prime\) with \(Y_{it} = (y_{it}, x_{it}^\prime)^\prime\). Conversely, if data
is not ordered or not balanced, data
must include two index variables that declare the cross-sectional unit \(i\) and the time period \(t\) of each observation.
a numerical or character vector of length \(N\) that indicates the group membership of each cross-sectional unit \(i\).
a character vector holding two strings. The first string denotes the name of the index variable identifying the cross-sectional unit \(i\), and the second string represents the name of the variable declaring the time period \(t\). The data is automatically sorted according to the variables in index
, which may produce errors when the time index is a character variable. In case of a balanced panel data set that is ordered in the long format, index
can be left empty if the the number of time periods n_periods
is supplied.
the number of observed time periods \(T\). If an index
character vector is passed, this argument can be left empty. Default is Null
.
the polynomial degree of the B-splines. Default is 3.
the number of interior knots of the B-splines. If left unspecified, the default heuristic \(M = \text{floor}((NT)^{\frac{1}{7}} - \log(p))\) is used. Note that \(M\) does not include the boundary knots and the entire sequence of knots is of length \(M + d + 1\).
a character vector containing the variable names of explanatory variables that enter with time-constant coefficients.
the tuning parameter balancing the fitness and penalty terms in the IC. If left unspecified, the heuristic \(\rho = 0.07 \frac{\log(NT)}{\sqrt{NT}}\) of Mehrabani (2023, sec. 6) is used. We recommend the default.
logical. If TRUE
, helpful warning messages are shown. Default is TRUE
.
logical. If TRUE
, certain operations are parallelized across multiple cores. Default is TRUE
.
ellipsis
of class tv_gplm
.
of class tv_gplm
.
Paul Haimerl
Consider the grouped time-varying panel data model $$y_{it} = \gamma_i + \beta^\prime_{i} (t/T) x_{it} + \epsilon_{it}, \quad i = 1, \dots, N, \; t = 1, \dots, T,$$ where \(y_{it}\) is the scalar dependent variable, \(\gamma_i\) is an individual fixed effect, \(x_{it}\) is a \(p \times 1\) vector of explanatory variables, and \(\epsilon_{it}\) is a zero mean error. The coefficient vector \(\beta_{i} (t/T)\) is subject to the observed group pattern $$\beta_i \left(\frac{t}{T} \right) = \sum_{k = 1}^K \alpha_k \left( \frac{t}{T} \right) \bold{1} \{i \in G_k \},$$ with \(\cup_{k = 1}^K G_k = \{1, \dots, N\}\), \(G_k \cap G_j = \emptyset\) and \(\| \alpha_k - \alpha_j \| \neq 0\) for any \(k \neq j\), \(k = 1, \dots, K\).
\(\alpha_k (t/T)\) and, in turn, \(\beta_i (t/T)\) is estimated as polynomial B-splines using the penalized sieve-technique. To this end, let \(B(v)\) denote a \(M + d +1\) vector of polynomial spline basis functions, where \(d\) represents the polynomial degree and \(M\) gives the number of interior knots of the B-spline. \(\alpha_{k}(t/T)\) is approximated by forming a linear combination of the basis functions \(\alpha_{k}(t/T) \approx \xi_k^\prime B(t/T)\), where \(\xi_k\) is a \((M + d + 1) \times p\) coefficient matrix.
The explanatory variables are projected onto the spline basis system, which results in the \((M + d + 1)p \times 1\) vector \(z_{it} = x_{it} \otimes B(v)\). Subsequently, the DGP can be reformulated as $$y_{it} = \gamma_i + z_{it}^\prime \text{vec}(\pi_{i}) + u_{it},$$ where \(\pi_i = \xi_k\) if \(i \in G_k\), \(u_{it} = \epsilon_{it} + \eta_{it}\), and \(\eta_{it}\) reflects a sieve approximation error. We refer to Su et al. (2019, sec. 2) for more details on the sieve technique.
Finally, \(\hat{\alpha}_{k}(t/T)\) is obtained as \(\hat{\alpha}_{k}(t/T) = \hat{\xi}_k^\prime B(t/T)\), where the vector of control points \(\xi_k\) is estimated using OLS $$\hat{\xi}_k = \left( \sum_{i \in G_k} \sum_{t = 1}^T \tilde{z}_{it} \tilde{z}_{it}^\prime \right)^{-1} \sum_{i \in G_k} \sum_{t = 1}^T \tilde{z}_{it} \tilde{y}_{it},$$ and \(\tilde{a}_{it} = a_{it} - T^{-1} \sum_{t = 1}^T a_{it}\), \(a = \{y, z\}\) to concentrate out the fixed effect \(\gamma_i\) (within-transformation).
In case of an unbalanced panel data set, the earliest and latest available observations per group define the start and end-points of the interval on which the group-specific time-varying coefficients are defined.
Su, L., Wang, X., & Jin, S. (2019). Sieve estimation of time-varying panel data models with latent structures. Journal of Business & Economic Statistics, 37(2), 334-349. tools:::Rd_expr_doi("10.1080/07350015.2017.1340299").
# Simulate a time-varying panel with a trend and a group pattern
set.seed(1)
sim <- sim_tv_DGP(N = 10, n_periods = 50, intercept = TRUE, p = 2)
df <- data.frame(y = c(sim$y), X = sim$X)
groups <- sim$groups
# Estimate the time-varying grouped panel data model
estim <- grouped_tv_plm(y ~ ., data = df, n_periods = 50, groups = groups)
summary(estim)
Run the code above in your browser using DataLab