Internal plain (non-bootstrap) routine for computing the four Westerlund (2007) ECM-based panel cointegration test statistics \(G_t\), \(G_a\), \(P_t\), and \(P_a\). The function estimates unit-specific ECM regressions to form the mean-group statistics and then constructs pooled (panel) statistics using cross-unit aggregation and partialling-out steps. Time indexing is handled strictly via gap-aware lag/difference helpers.
WesterlundPlain(
data,
touse,
idvar,
timevar,
yvar,
xvars,
constant = FALSE,
trend = FALSE,
lags,
leads = NULL,
lrwindow = 2,
westerlund = FALSE,
aic = TRUE,
bootno = FALSE,
indiv.ecm = FALSE,
verbose = FALSE
)A nested list containing:
stats: A list of the four raw Westerlund test statistics:
Gt: Mean-group tau statistic.
Ga: Mean-group alpha statistic.
Pt: Pooled tau statistic.
Pa: Pooled alpha statistic.
indiv_data: A named list where each element corresponds to a cross-sectional unit (ID), containing:
ai: The estimated speed of adjustment (alpha).
seai: The standard error of alpha (adjusted for degrees of freedom).
betai: Vector of long-run coefficients (\(\beta = -\lambda / \alpha\)).
blag, blead: The lags and leads selected for that specific unit.
ti: Raw observation count for the unit.
tnorm: Degrees of freedom used for normalization.
reg_coef: If indiv.ecm = TRUE, the full coefficient matrix from westerlund_test_reg.
results_df: A summary data.frame containing all unit-level results in vectorized format.
settings: A list of routine metadata:
meanlag, meanlead: Integer averages of the selected unit lags/leads.
realmeanlag, realmeanlead: Numeric averages of the selected unit lags/leads.
auto: Logical; TRUE if automatic selection (ranges) was used.
A data.frame containing panel data.
Logical vector of length nrow(data) indicating rows eligible for estimation. Rows are further filtered to remove missing yvar and xvars.
String. Column identifying cross-sectional units.
String. Column identifying time.
String. Name of the dependent variable (levels).
Character vector. Names of regressors in the long-run relationship (levels).
Logical. If TRUE, includes a constant term in the ECM design matrix.
Logical. If TRUE, includes a linear time trend in the ECM design matrix.
Integer or length-2 integer vector. Fixed lag order or range c(min,max) for short-run dynamics. If a range is supplied, the routine performs an information-criterion search over candidate lag/lead combinations.
Integer or length-2 integer vector, or NULL. Fixed lead order or range c(min,max). If NULL, defaults to 0.
Integer. Bartlett kernel window (maximum lag) used in long-run variance calculations via calc_lrvar_bartlett.
Logical. If TRUE, uses a Westerlund-specific information criterion and trimming logic for variance estimation.
Logical. If TRUE, uses AIC for lag/lead selection when ranges. If FALSE, uses BIC.
Logical. If TRUE, prints a short header and progress dots (intended for higher-level routines).
Logical. If TRUE, gets output of individual ECM regressions.
Logical. If TRUE, prints additional output.
Loop 1 (mean-group) estimates unit-specific ECMs. Each unit produces an estimated error-correction coefficient on \(y_{t-1}\) and an associated standard error. These are aggregated into \(G_t\) and \(G_a\).
Loop 2 (pooled) fixes a common short-run structure based on the average selected lag/lead orders and constructs pooled residual products to obtain \(P_t\) and \(P_a\).
All lags and differences are computed using strict time-based helpers
(get_lag, get_diff). This ensures that gaps in the
time index propagate as missing values rather than shifting across gaps.
Purpose and status.
WesterlundPlain() is typically called internally by westerlund_test.
It returns the four raw test statistics and lag/lead diagnostics needed
for printing and standardization.
Workflow overview. The routine proceeds in two main stages:
Unit-specific ECM regressions (Loop 1): For each cross-sectional unit, it constructs an ECM with
\(\Delta y_t\) as the dependent variable and includes deterministic terms (optional), \(y_{t-1}\),
\(x_{t-1}\), lagged \(\Delta y_t\), and leads/lags of \(\Delta x_t\). Lags and leads are computed using
strict time-indexed helpers (get_lag, get_diff), which respect gaps in the time index.
If lags and/or leads are provided as ranges, an information-criterion search selects the
lag/lead orders for each unit. The routine stores the unit-level error-correction estimate \(\hat{\alpha}_i\)
and its standard error.
Pooled (panel) aggregation (Loop 2): Using the mean of selected lag/lead orders across units, the routine constructs pooled quantities needed for \(P_t\) and \(P_a\) via partialling-out regressions and cross-unit aggregation of residual products.
Long-run variance calculations.
Long-run variances are computed using calc_lrvar_bartlett with
maxlag = lrwindow. In westerlund=TRUE mode, the routine applies
Stata-like trimming at the start/end of the differenced series based on selected
lags/leads prior to long-run variance estimation.
Returned statistics. Let \(\hat{\alpha}_i\) denote the unit-specific error-correction coefficient on \(y_{t-1}\) (as constructed in the ECM), with standard error \(\widehat{\mathrm{se}}(\hat{\alpha}_i)\). The routine computes:
\(G_t\): the mean of the individual t-ratios \(\hat{\alpha}_i/\widehat{\mathrm{se}}(\hat{\alpha}_i)\),
\(G_a\): a scaled mean-group statistic using a unit-specific normalization factor derived from long-run variances,
\(P_t\): a pooled t-type statistic based on a pooled \(\hat{\alpha}\) and its pooled standard error,
\(P_a\): a pooled scaled statistic using an average effective time dimension.
Westerlund, J. (2007). Testing for error correction in panel data. Oxford Bulletin of Economics and Statistics, 69(6), 709--748.
westerlund_test,
WesterlundBootstrap,
get_lag,
get_diff,
calc_lrvar_bartlett
# \donttest{
set.seed(123)
N <- 5
T <- 20
df <- data.frame(
id = rep(1:N, each = T),
t = rep(1:T, N),
y = rnorm(N * T),
x1 = rnorm(N * T),
x2 = rnorm(N * T)
)
touse <- rep(TRUE, nrow(df))
plain_res <- WesterlundPlain(
data = df,
touse = touse,
idvar = "id",
timevar = "t",
yvar = "y",
xvars = c("x1","x2"),
lags = 1,
leads = 0
)
# Accessing results from the nested structure:
stats <- plain_res$stats
print(c(Gt = stats$Gt, Ga = stats$Ga, Pt = stats$Pt, Pa = stats$Pa))
# Checking unit-specific coefficients for ID '101'
unit_101 <- plain_res$indiv_data[["101"]]
print(unit_101$ai)
# }
Run the code above in your browser using DataLab