Internal plain (non-bootstrap) routine for computing the four Westerlund (2007) ECM-based panel cointegration test statistics \(G_t\), \(G_a\), \(P_t\), and \(P_a\). The function estimates unit-specific ECM regressions to form the mean-group statistics and then constructs pooled (panel) statistics using cross-unit aggregation and partialling-out steps. Time indexing is handled strictly via gap-aware lag/difference helpers.
WesterlundPlain(
data,
touse,
idvar,
timevar,
yvar,
xvars,
constant = FALSE,
trend = FALSE,
lags,
leads = NULL,
lrwindow = 2,
westerlund = FALSE,
bootno = FALSE,
indiv.ecm = FALSE,
debug = FALSE,
verbose = FALSE
)A nested list containing:
stats: A list of the four raw Westerlund test statistics:
Gt: Mean-group tau statistic.
Ga: Mean-group alpha statistic.
Pt: Pooled tau statistic.
Pa: Pooled alpha statistic.
indiv_data: A named list where each element corresponds to a cross-sectional unit (ID), containing:
ai: The estimated speed of adjustment (alpha).
seai: The standard error of alpha (adjusted for degrees of freedom).
betai: Vector of long-run coefficients (\(\beta = -\lambda / \alpha\)).
blag, blead: The lags and leads selected for that specific unit.
ti: Raw observation count for the unit.
tnorm: Degrees of freedom used for normalization.
reg_coef: If indiv.ecm = TRUE, the full coefficient matrix from westerlund_test_reg.
results_df: A summary data.frame containing all unit-level results in vectorized format.
settings: A list of routine metadata:
meanlag, meanlead: Integer averages of the selected unit lags/leads.
realmeanlag, realmeanlead: Numeric averages of the selected unit lags/leads.
auto: Logical; TRUE if automatic selection (ranges) was used.
A data.frame containing panel data.
Logical vector of length nrow(data) indicating rows eligible for estimation. Rows are further filtered to remove missing yvar and xvars.
String. Column identifying cross-sectional units.
String. Column identifying time.
String. Name of the dependent variable (levels).
Character vector. Names of regressors in the long-run relationship (levels).
Logical. If TRUE, includes a constant term in the ECM design matrix.
Logical. If TRUE, includes a linear time trend in the ECM design matrix.
Integer or length-2 integer vector. Fixed lag order or range c(min,max) for short-run dynamics. If a range is supplied, the routine performs an information-criterion search over candidate lag/lead combinations.
Integer or length-2 integer vector, or NULL. Fixed lead order or range c(min,max). If NULL, defaults to 0.
Integer. Bartlett kernel window (maximum lag) used in long-run variance calculations via calc_lrvar_bartlett.
Logical. If TRUE, uses a Westerlund-specific information criterion and trimming logic for variance estimation.
Logical. If TRUE, prints a short header and progress dots (intended for higher-level routines).
Logical. If TRUE, gets output of individual ECM regressions.
Logical. If TRUE, suppresses progress dots in some branches and can be used for debugging prints.
Logical. If TRUE, prints additional output.
Loop 1 (mean-group) estimates unit-specific ECMs. Each unit produces an estimated error-correction coefficient on \(y_{t-1}\) and an associated standard error. These are aggregated into \(G_t\) and \(G_a\).
Loop 2 (pooled) fixes a common short-run structure based on the average selected lag/lead orders and constructs pooled residual products to obtain \(P_t\) and \(P_a\).
All lags and differences are computed using strict time-based helpers
(get_lag, get_diff). This ensures that gaps in the
time index propagate as missing values rather than shifting across gaps.
Purpose and status.
WesterlundPlain() is typically called internally by westerlund_test.
It returns the four raw test statistics and lag/lead diagnostics needed
for printing and standardization.
Workflow overview. The routine proceeds in two main stages:
Unit-specific ECM regressions (Loop 1): For each cross-sectional unit, it constructs an ECM with
\(\Delta y_t\) as the dependent variable and includes deterministic terms (optional), \(y_{t-1}\),
\(x_{t-1}\), lagged \(\Delta y_t\), and leads/lags of \(\Delta x_t\). Lags and leads are computed using
strict time-indexed helpers (get_lag, get_diff), which respect gaps in the time index.
If lags and/or leads are provided as ranges, an information-criterion search selects the
lag/lead orders for each unit. The routine stores the unit-level error-correction estimate \(\hat{\alpha}_i\)
and its standard error.
Pooled (panel) aggregation (Loop 2): Using the mean of selected lag/lead orders across units, the routine constructs pooled quantities needed for \(P_t\) and \(P_a\) via partialling-out regressions and cross-unit aggregation of residual products.
Long-run variance calculations.
Long-run variances are computed using calc_lrvar_bartlett with
maxlag = lrwindow. In westerlund=TRUE mode, the routine applies
Stata-like trimming at the start/end of the differenced series based on selected
lags/leads prior to long-run variance estimation.
Returned statistics. Let \(\hat{\alpha}_i\) denote the unit-specific error-correction coefficient on \(y_{t-1}\) (as constructed in the ECM), with standard error \(\widehat{\mathrm{se}}(\hat{\alpha}_i)\). The routine computes:
\(G_t\): the mean of the individual t-ratios \(\hat{\alpha}_i/\widehat{\mathrm{se}}(\hat{\alpha}_i)\),
\(G_a\): a scaled mean-group statistic using a unit-specific normalization factor derived from long-run variances,
\(P_t\): a pooled t-type statistic based on a pooled \(\hat{\alpha}\) and its pooled standard error,
\(P_a\): a pooled scaled statistic using an average effective time dimension.
Westerlund, J. (2007). Testing for error correction in panel data. Oxford Bulletin of Economics and Statistics, 69(6), 709--748.
westerlund_test,
WesterlundBootstrap,
get_lag,
get_diff,
calc_lrvar_bartlett
if (FALSE) {
plain_res <- WesterlundPlain(
data = df,
touse = touse,
idvar = "id",
timevar = "t",
yvar = "y",
xvars = c("x1","x2"),
lags = 1,
leads = 0
)
# Accessing results from the nested structure:
stats <- plain_res$stats
print(c(Gt = stats$Gt, Ga = stats$Ga, Pt = stats$Pt, Pa = stats$Pa))
# Checking unit-specific coefficients for ID '101'
unit_101 <- plain_res$indiv_data[["101"]]
print(unit_101$ai)
}
Run the code above in your browser using DataLab