WesterlundPlain: Compute Raw Westerlund ECM Panel Cointegration Statistics (Plain Routine)

Description

Internal plain (non-bootstrap) routine for computing the four Westerlund (2007) ECM-based panel cointegration test statistics \(G_t\), \(G_a\), \(P_t\), and \(P_a\). The function estimates unit-specific ECM regressions to form the mean-group statistics and then constructs pooled (panel) statistics using cross-unit aggregation and partialling-out steps. Time indexing is handled strictly via gap-aware lag/difference helpers.

Usage

WesterlundPlain(
  data,
  touse,
  idvar,
  timevar,
  yvar,
  xvars,
  constant = FALSE,
  trend = FALSE,
  lags,
  leads = NULL,
  lrwindow = 2,
  westerlund = FALSE,
  aic = TRUE,
  bootno = FALSE,
  indiv.ecm = FALSE,
  verbose = FALSE
)

Value

A nested list containing:

stats: A list of the four raw Westerlund test statistics:
- Gt: Mean-group tau statistic.
- Ga: Mean-group alpha statistic.
- Pt: Pooled tau statistic.
- Pa: Pooled alpha statistic.
indiv_data: A named list where each element corresponds to a cross-sectional unit (ID), containing:
- ai: The estimated speed of adjustment (alpha).
- seai: The standard error of alpha (adjusted for degrees of freedom).
- betai: Vector of long-run coefficients (\(\beta = -\lambda / \alpha\)).
- blag, blead: The lags and leads selected for that specific unit.
- ti: Raw observation count for the unit.
- tnorm: Degrees of freedom used for normalization.
- reg_coef: If indiv.ecm = TRUE, the full coefficient matrix from westerlund_test_reg.
results_df: A summary data.frame containing all unit-level results in vectorized format.
settings: A list of routine metadata:
- meanlag, meanlead: Integer averages of the selected unit lags/leads.
- realmeanlag, realmeanlead: Numeric averages of the selected unit lags/leads.
- auto: Logical; TRUE if automatic selection (ranges) was used.

Arguments

data: A data.frame containing panel data.
touse: Logical vector of length nrow(data) indicating rows eligible for estimation. Rows are further filtered to remove missing yvar and xvars.
idvar: String. Column identifying cross-sectional units.
timevar: String. Column identifying time.
yvar: String. Name of the dependent variable (levels).
xvars: Character vector. Names of regressors in the long-run relationship (levels).
constant: Logical. If TRUE, includes a constant term in the ECM design matrix.
trend: Logical. If TRUE, includes a linear time trend in the ECM design matrix.
lags: Integer or length-2 integer vector. Fixed lag order or range c(min,max) for short-run dynamics. If a range is supplied, the routine performs an information-criterion search over candidate lag/lead combinations.
leads: Integer or length-2 integer vector, or NULL. Fixed lead order or range c(min,max). If NULL, defaults to 0.
lrwindow: Integer. Bartlett kernel window (maximum lag) used in long-run variance calculations via calc_lrvar_bartlett.
westerlund: Logical. If TRUE, uses a Westerlund-specific information criterion and trimming logic for variance estimation.
aic: Logical. If TRUE, uses AIC for lag/lead selection when ranges. If FALSE, uses BIC.
bootno: Logical. If TRUE, prints a short header and progress dots (intended for higher-level routines).
indiv.ecm: Logical. If TRUE, gets output of individual ECM regressions.
verbose: Logical. If TRUE, prints additional output.

Internal Logic

Two-stage structure

Loop 1 (mean-group) estimates unit-specific ECMs. Each unit produces an estimated error-correction coefficient on \(y_{t-1}\) and an associated standard error. These are aggregated into \(G_t\) and \(G_a\).

Loop 2 (pooled) fixes a common short-run structure based on the average selected lag/lead orders and constructs pooled residual products to obtain \(P_t\) and \(P_a\).

Strict time indexing and gaps

All lags and differences are computed using strict time-based helpers (get_lag, get_diff). This ensures that gaps in the time index propagate as missing values rather than shifting across gaps.

Details

Purpose and status. WesterlundPlain() is typically called internally by westerlund_test. It returns the four raw test statistics and lag/lead diagnostics needed for printing and standardization.

Workflow overview. The routine proceeds in two main stages:

Unit-specific ECM regressions (Loop 1): For each cross-sectional unit, it constructs an ECM with \(\Delta y_t\) as the dependent variable and includes deterministic terms (optional), \(y_{t-1}\), \(x_{t-1}\), lagged \(\Delta y_t\), and leads/lags of \(\Delta x_t\). Lags and leads are computed using strict time-indexed helpers (get_lag, get_diff), which respect gaps in the time index. If lags and/or leads are provided as ranges, an information-criterion search selects the lag/lead orders for each unit. The routine stores the unit-level error-correction estimate \(\hat{\alpha}_i\) and its standard error.
Pooled (panel) aggregation (Loop 2): Using the mean of selected lag/lead orders across units, the routine constructs pooled quantities needed for \(P_t\) and \(P_a\) via partialling-out regressions and cross-unit aggregation of residual products.

Long-run variance calculations. Long-run variances are computed using calc_lrvar_bartlett with maxlag = lrwindow. In westerlund=TRUE mode, the routine applies Stata-like trimming at the start/end of the differenced series based on selected lags/leads prior to long-run variance estimation.

Returned statistics. Let \(\hat{\alpha}_i\) denote the unit-specific error-correction coefficient on \(y_{t-1}\) (as constructed in the ECM), with standard error \(\widehat{\mathrm{se}}(\hat{\alpha}_i)\). The routine computes:

\(G_t\): the mean of the individual t-ratios \(\hat{\alpha}_i/\widehat{\mathrm{se}}(\hat{\alpha}_i)\),
\(G_a\): a scaled mean-group statistic using a unit-specific normalization factor derived from long-run variances,
\(P_t\): a pooled t-type statistic based on a pooled \(\hat{\alpha}\) and its pooled standard error,
\(P_a\): a pooled scaled statistic using an average effective time dimension.

References

Westerlund, J. (2007). Testing for error correction in panel data. Oxford Bulletin of Economics and Statistics, 69(6), 709--748.

Examples

Run this code

# \donttest{
set.seed(123)
N <- 5
T <- 20
df <- data.frame(
  id = rep(1:N, each = T),
  t  = rep(1:T, N),
  y  = rnorm(N * T),
  x1 = rnorm(N * T),
  x2 = rnorm(N * T)
)

touse <- rep(TRUE, nrow(df))

plain_res <- WesterlundPlain(
  data       = df,
  touse      = touse,
  idvar      = "id",
  timevar    = "t",
  yvar       = "y",
  xvars      = c("x1","x2"),
  lags       = 1,
  leads      = 0
)

# Accessing results from the nested structure:
stats <- plain_res$stats
print(c(Gt = stats$Gt, Ga = stats$Ga, Pt = stats$Pt, Pa = stats$Pa))

# Checking unit-specific coefficients for ID '101'
unit_101 <- plain_res$indiv_data[["101"]]
print(unit_101$ai)
# }

Run the code above in your browser using DataLab