Estimate a panel data model subject to an observed group structure. Slope parameters are homogeneous within groups but heterogeneous across groups. This function supports both static and dynamic panel data models, with or without endogenous regressors.
grouped_plm(
formula,
data,
groups,
index = NULL,
n_periods = NULL,
method = "PLS",
Z = NULL,
bias_correc = FALSE,
rho = 0.07 * log(N * n_periods)/sqrt(N * n_periods),
verbose = TRUE,
parallel = TRUE,
...
)# S3 method for gplm
print(x, ...)
# S3 method for gplm
formula(x, ...)
# S3 method for gplm
df.residual(object, ...)
# S3 method for gplm
summary(object, ...)
# S3 method for gplm
coef(object, ...)
# S3 method for gplm
residuals(object, ...)
# S3 method for gplm
fitted(object, ...)
An object of class gplm holding
modela data.frame containing the dependent and explanatory variables as well as cross-sectional and time indices,
coefficientsa \(K \times p\) matrix of the group-specific parameter estimates,
groupsa list containing (i) the total number of groups \(K\) and (ii) a vector of group memberships \((g_1, \dots, g_N)\), where \(g_i = k\) if \(i\) is assigned to group \(k\),
residualsa vector of residuals of the demeaned model,
fitteda vector of fitted values of the demeaned model,
argsa list of additional arguments,
ICa list containing (i) the value of the IC and (ii) the MSE,
callthe function call.
A gplm object has print, summary, fitted, residuals, formula, df.residual, and coef S3 methods.
a formula object describing the model to be estimated.
a data.frame or matrix holding a panel data set. If no index variables are provided, the panel must be balanced and ordered in the long format \(\bold{Y}=(Y_1^\prime, \dots, Y_N^\prime)^\prime\), \(Y_i = (Y_{i1}, \dots, Y_{iT})^\prime\) with \(Y_{it} = (y_{it}, x_{it}^\prime)^\prime\). Conversely, if data is not ordered or not balanced, data must include two index variables that declare the cross-sectional unit \(i\) and the time period \(t\) of each observation.
a numerical or character vector of length \(N\) that indicates the group membership of each cross-sectional unit \(i\).
a character vector holding two strings. The first string denotes the name of the index variable identifying the cross-sectional unit \(i\) and the second string represents the name of the variable declaring the time period \(t\). The data is automatically sorted according to the variables in index, which may produce errors when the time index is a character variable. In case of a balanced panel data set that is ordered in the long format, index can be left empty if the number of time periods n_periods is supplied.
the number of observed time periods \(T\). If an index is passed, this argument can be left empty. Default is NULL.
the estimation method. Options are
"PLS"for using the penalized least squares (PLS) algorithm. We recommend PLS in case of (weakly) exogenous regressors (Mehrabani, 2023, sec. 2.2).
"PGMM"for using the penalized Generalized Method of Moments (PGMM). PGMM is required when instrumenting endogenous regressors, in which case a matrix \(\bold{Z}\) containing the necessary exogenous instruments must be supplied (Mehrabani, 2023, sec. 2.3).
Default is "PLS".
a \(NT \times q\) matrix or data.frame of exogenous instruments, where \(q \geq p\), \(\bold{Z}=(z_1^\prime, \dots, z_N^\prime)^\prime\), \(z_i = (z_{i1}, \dots, z_{iT})^\prime\) and \(z_{it}\) is a \(q \times 1\) vector. Z is only required when method = "PGMM" is selected. When using "PLS", the argument can be left empty or it is disregarded. Default is NULL.
logical. If TRUE, a Split-panel Jackknife bias correction following Dhaene and Jochmans (2015) is applied to the slope parameters. We recommend using the correction when working with dynamic panels. Default is FALSE.
a tuning parameter balancing the fitness and penalty terms in the IC. If left unspecified, the heuristic \(\rho = 0.07 \frac{\log(NT)}{\sqrt{NT}}\) of Mehrabani (2023, sec. 6) is used. We recommend the default.
logical. If TRUE, helpful warning messages are shown. Default is TRUE.
logical. If TRUE, certain operations are parallelized across multiple cores. Default is TRUE.
ellipsis
of class gplm.
of class gplm.
Paul Haimerl
Consider the grouped panel data model
$$y_{it} = \gamma_i^0 + \bold{\beta}^{0 \prime}_{i} \bold{x}_{it} + \epsilon_{it}, \quad i = 1, \dots, N, \; t = 1, \dots, T,$$
where \(y_{it}\) is the scalar dependent variable, \(\gamma_i^0\) is an individual fixed effect, \(\bold{x}_{it}\) is a \(p \times 1\) vector of (weakly) exogenous explanatory variables, and \(\epsilon_{it}\) denotes a zero mean error.
The coefficient vector \(\bold{\beta}_i^0\) follows the observed group pattern
$$\bold{\beta}_i^0 = \sum_{k = 1}^K \bold{\alpha}_k^0 \bold{1} \{i \in G_k \},$$
with \(\cup_{k = 1}^K G_k = \{1, \dots, N\}\), \(G_k \cap G_j = \emptyset\) and \(\| \bold{\alpha}_k^0 - \bold{\alpha}_j^0 \| \neq 0\) for any \(k \neq j\), \(k,j = 1, \dots, K\). The group structure \(G_1, \dots, G_K\) is determined by the argument groups.
Using PLS, the group-specific coefficients of group \(k, \, k = 1, \dots, K\), are obtained by OLS $$\hat{\bold{\alpha}}_k = \left( \sum_{i \in G_k} \sum_{t = 1}^T \tilde{\bold{x}}_{it} \tilde{\bold{x}}_{it}^\prime \right)^{-1} \sum_{i \in G_k} \sum_{t = 1}^T \tilde{\bold{x}}_{it} \tilde{y}_{it},$$ where \(\tilde{a}_{it} = a_{it} - T^{-1} \sum_{t=1}^T a_{it}\), \(a = \{y, \bold{x}\}\) to concentrate out the individual fixed effects \(\gamma_i^0\) (within-transformation).
In case of PGMM, the slope coefficients are derived as $$ \hat{\alpha}_k = \left( \left[ \sum_{i \in G_k} T^{-1} \sum_{t = 1}^T \bold{z}_{it} \Delta \bold{x}_{it} \right]^\prime \bold{W}_k \left[ \sum_{i \in G_k} T^{-1} \sum_{t = 1}^T \bold{z}_{it} \Delta \bold{x}_{it} \right] \right)^{-1} $$ $$ \quad \quad \left[ \sum_{i \in G_k} T^{-1} \sum_{t = 1}^T \bold{z}_{it} \Delta \bold{x}_{it} \right]^\prime \bold{W}_k \left[ \sum_{i \in G_k} T^{-1} \sum_{t = 1}^T \bold{z}_{it} \Delta y_{it} \right], $$ where \(\bold{W}_k\) is a \(q \times q\) p.d. symmetric weight matrix and \(\Delta\) denotes the first difference operator \(\Delta \bold{x}_{it} = \bold{x}_{it} - \bold{x}_{it-1}\) (first-difference transformation).
Dhaene, G., & Jochmans, K. (2015). Split-panel jackknife estimation of fixed-effect models. The Review of Economic Studies, 82(3), 991-1030. tools:::Rd_expr_doi("10.1093/restud/rdv007").
Mehrabani, A. (2023). Estimation and identification of latent group structures in panel data. Journal of Econometrics, 235(2), 1464-1482. tools:::Rd_expr_doi("10.1016/j.jeconom.2022.12.002").
# Simulate a panel with a group structure
set.seed(1)
sim <- sim_DGP(N = 20, n_periods = 80, p = 2, n_groups = 3)
y <- sim$y
X <- sim$X
groups <- sim$groups
df <- cbind(y = c(y), X)
# Estimate the grouped panel data model
estim <- grouped_plm(y ~ ., data = df, groups = groups, n_periods = 80, method = "PLS")
summary(estim)
# Lets pass a panel data set with explicit cross-sectional and time indicators
i_index <- rep(1:20, each = 80)
t_index <- rep(1:80, 20)
df <- data.frame(y = c(y), X, i_index = i_index, t_index = t_index)
estim <- grouped_plm(
y ~ .,
data = df, index = c("i_index", "t_index"), groups = groups, method = "PLS"
)
summary(estim)
Run the code above in your browser using DataLab