grouped_tv_plm: Grouped Time-varying Panel Data Model

Description

Estimate a time-varying panel data model subject to an observed group structure. Coefficient functions are homogeneous within groups but heterogeneous across groups. Time-varying coefficient functions are approximated as polynomial B-splines. The function supports both static and dynamic panel data models.

Usage

grouped_tv_plm(
  formula,
  data,
  groups,
  index = NULL,
  n_periods = NULL,
  d = 3,
  M = floor(length(y)^(1/7) - log(p)),
  const_coef = NULL,
  rho = 0.04 * log(N * n_periods)/sqrt(N * n_periods),
  verbose = TRUE,
  parallel = TRUE,
  ...
)
# S3 method for tv_gplm
summary(object, ...)
# S3 method for tv_gplm
formula(x, ...)
# S3 method for tv_gplm
df.residual(object, ...)
# S3 method for tv_gplm
print(x, ...)
# S3 method for tv_gplm
coef(object, ...)
# S3 method for tv_gplm
residuals(object, ...)
# S3 method for tv_gplm
fitted(object, ...)

Value

An object of class tv_gplm holding

model: a data.frame containing the dependent and explanatory variables as well as cross-sectional and time indices,
coefficients: let $p^{(1)}$ denote the number of time-varying and $p^{(2)}$ the number of time constant coefficients. A list holding (i) a $T \times p^{(1)} \times K$ array of the group-specific functional coefficients and (ii) a $K \times p^{(2)}$ matrix of time-constant estimates.
groups: a list containing (i) the total number of groups $K$ and (ii) a vector of group memberships $G = (g_1, \dots, g_N)$, where $g_i = k$ if $i$ is part of group $k$,
residuals: a vector of residuals of the demeaned model,
fitted: a vector of fitted values of the demeaned model,
args: a list of additional arguments,
IC: a list containing (i) the value of the IC and (ii) the MSE,
call: the function call.

An object of class tv_gplm has print, summary, fitted, residuals, formula, df.residual, and coef S3 methods.

Arguments

formula: a formula object describing the model to be estimated.
data: a data.frame or matrix holding a panel data set. If no index variables are provided, the panel must be balanced and ordered in the long format $\bold{Y}=(Y_1^\prime, \dots, Y_N^\prime)^\prime$, $Y_i = (Y_{i1}, \dots, Y_{iT})^\prime$ with $Y_{it} = (y_{it}, \bold{x}_{it}^\prime)^\prime$. Conversely, if data is not ordered or not balanced, data must include two index variables that declare the cross-sectional unit $i$ and the time period $t$ of each observation.
groups: a numerical or character vector of length $N$ that indicates the group membership of each cross-sectional unit $i$.
index: a character vector holding two strings. The first string denotes the name of the index variable identifying the cross-sectional unit $i$, and the second string represents the name of the variable declaring the time period $t$. The data is automatically sorted according to the variables in index, which may produce errors when the time index is a character variable. In case of a balanced panel data set that is ordered in the long format, index can be left empty if the number of time periods n_periods is supplied.
n_periods: the number of observed time periods $T$. If an index character vector is passed, this argument can be left empty. Default is NULL.
d: the polynomial degree of the B-splines. Default is 3.
M: the number of interior knots of the B-splines. If left unspecified, the default heuristic $M = \text{floor}((NT)^{\frac{1}{7}} - \log(p))$ is used, following Haimerl et al. (2025). Note that $M$ does not include the boundary knots and the entire sequence of knots is of length $M + d + 1$.
const_coef: a character vector containing the variable names of explanatory variables that enter with time-constant coefficients.
rho: the tuning parameter balancing the fitness and penalty terms in the IC. If left unspecified, the heuristic $\rho = 0.07 \frac{\log(NT)}{\sqrt{NT}}$ of Haimerl et al. (2025) is used. We recommend the default.
verbose: logical. If TRUE, helpful warning messages are shown. Default is TRUE.
parallel: logical. If TRUE, certain operations are parallelized across multiple cores. Default is TRUE.
...: ellipsis
object: of class tv_gplm.
x: of class tv_gplm.

Author

Paul Haimerl

Details

Consider the time-varying panel data model $$y_{it} = \gamma_i^0 + \bold{\beta}^{0\prime}_{i} (t/T) \bold{x}_{it} + \epsilon_{it}, \quad i = 1, \dots, N, \; t = 1, \dots, T,$$ where $y_{it}$ is the scalar dependent variable, $\gamma_i^0$ is an individual fixed effect, $\bold{x}_{it}$ is a $p \times 1$ vector of explanatory variables, and $\epsilon_{it}$ is a zero mean error. The $p$-dimensional coefficient vector $\bold{\beta}_{i}^0 (t/T)$ contains smooth functions of time and follows the observed group pattern $$\bold{\beta}_i^0 \left(\frac{t}{T} \right) = \sum_{k = 1}^K \bold{\alpha}_k^0 \left( \frac{t}{T} \right) \bold{1} \{i \in G_k \},$$ with $\cup_{k = 1}^K G_k = \{1, \dots, N\}$, $G_k \cap G_j = \emptyset$ for any $k \neq j$, $k,j = 1, \dots, K$. The group structure $G_1, \dots, G_K$ is determined by the argument groups.

The time-varying coefficient functions in $\bold{\alpha}_k (t/T)$ and, in turn, $\bold{\beta}_i (t/T)$ are estimated as polynomial B-splines. To this end, let $\bold{b}(v)$ denote a $M + d +1$ vector of polynomial basis functions with the polynomial degree $d$ and $M$ interior knots. $\bold{\alpha}_k^0 (t/T)$ is approximated by forming linear combinations of the basis functions $\bold{\alpha}_k^0 (t/T) \approx \bold{\Xi}_k^{0 \prime} \bold{b} (t/T)$, where $\bold{\Xi}_i^{0}$ is a group-specific $(M + d + 1) \times p$ matrix of spline control points.

To estimate $\bold{\Xi}_k^{0}$, we project the explanatory variables onto the spline basis system, resulting in the $(M + d + 1)p \times 1$ regressor vector $\bold{z}_{it} = \bold{x}_{it} \otimes \bold{b}(v)$. Subsequently, the DGP can be reformulated as $$y_{it} = \gamma_i^0 + \bold{\pi}_{i}^{0 \prime} \bold{z}_{it} + u_{it},$$ where $\bold{\pi}_i^0 = \text{vec}(\bold{\Pi}_i^0)$ and $\bold{\Xi}_k^0 = \bold{\Pi}_i^0$ if $i \in G_k$. $u_{it} = \epsilon_{it} + \eta_{it}$ collects the idiosyncratic $\epsilon_{it}$ and the sieve approximation error $\eta_{it}$. Then, we obtain $\hat{\bold{\xi}}_k = \text{vec}( \hat{\bold{\Xi}}_k)$ by $$\hat{\xi}_k = \left( \sum_{i \in G_k} \sum_{t = 1}^T \tilde{\bold{z}}_{it} \tilde{\bold{z}}_{it}^\prime \right)^{-1} \sum_{i \in G_k} \sum_{t = 1}^T \tilde{\bold{z}}_{it} \tilde{y}_{it},$$ with $\tilde{a}_{it} = a_{it} - T^{-1} \sum_{t = 1}^T a_{it}$, $a = \{y, \bold{z}\}$ to concentrate out the fixed effect $\gamma_i^0$ (within-transformation). Lastly, $\hat{\bold{\alpha}}_k (t/T) = \hat{\bold{\Xi}}_k^{\prime} \bold{b} (t/T)$. We refer to Haimerl et al. (2025, sec. 2) for more details.

In case of an unbalanced panel data set, the earliest and latest available observations per group define the start and end-points of the interval on which the group-specific time-varying coefficients are defined.

References

Haimerl, P., Smeekes, S., & Wilms, I. (2025). Estimation of latent group structures in time-varying panel data models. arXiv preprint arXiv:2503.23165. tools:::Rd_expr_doi("10.48550/arXiv.2503.23165").

Examples

Run this code

# Simulate a time-varying panel with a trend and a group pattern
set.seed(1)
sim <- sim_tv_DGP(N = 10, n_periods = 50, intercept = TRUE, p = 2)
df <- data.frame(y = c(sim$y), X = sim$X)
groups <- sim$groups

# Estimate the time-varying grouped panel data model
estim <- grouped_tv_plm(y ~ ., data = df, n_periods = 50, groups = groups)
summary(estim)

Run the code above in your browser using DataLab