signal_analysis: Comprehensive Signal Analysis for Panel Data

Description

Master function that orchestrates the complete signal extraction pipeline, integrating spectral decomposition (wavelets, EMD, HP-GC), Bayesian variable' selection (regularized Horseshoe), dimensionality reduction (PCA, DFM), and stationarity testing into a unified analytical framework.

The function constructs a target signal Y from candidate variables X in panel data and applies multiple complementary methodologies to extract the latent structure from phenomenological dynamics.

Usage

signal_analysis(
  data,
  y_formula,
  time_var = NULL,
  group_var = NULL,
  methods = "all",
  filter_config = list(),
  horseshoe_config = list(),
  pca_config = list(),
  dfm_config = list(),
  unitroot_tests = "all",
  na_action = c("interpolate", "omit", "fail"),
  standardize = TRUE,
  first_difference = FALSE,
  verbose = TRUE,
  seed = NULL
)

Value

An S3 object of class "signal_analysis" containing:

call: The matched function call
data: Processed input data
Y: The constructed target signal
X: The predictor matrix
filters: Results from spectral decomposition methods
horseshoe: Results from Bayesian variable selection
pca: Results from PCA with bootstrap
dfm: Results from Dynamic Factor Model
unitroot: Results from unit root tests
interpretation: Automated technical interpretation
config: Configuration parameters used

Arguments

data

A data.frame or matrix containing the panel data. For data.frames, time should be in rows and variables in columns.

y_formula

Formula specifying how to construct Y from X variables, or a character string naming the pre-constructed Y column in data.

time_var

Character string naming the time variable (optional, assumes rows are ordered by time if NULL).

group_var

Character string naming the group/panel variable for panel data (optional for single time series).

methods

Character vector specifying which methods to apply. Options: "wavelet", "emd", "hpgc", "horseshoe", "pca", "dfm", "unitroot", or "all" (default).

filter_config

List of configuration options for filtering methods:

wavelet_filter: Wavelet filter type (default: "la8")

wavelet_levels

Which detail levels to combine (default: c(3,4))

emd_max_imf

Maximum IMFs for EMD (default: 10)

hpgc_prior

Prior configuration: "weak", "informative", "empirical" (default: "weak")

hpgc_chains

Number of MCMC chains (default: 4)

hpgc_iterations

Total iterations per chain (default: 20000)

horseshoe_config

List of configuration for Horseshoe regression:

p0: Expected number of relevant predictors (default: NULL for auto)

chains

Number of MCMC chains (default: 4)

iter_sampling

Sampling iterations per chain (default: 2000)

iter_warmup

Warmup iterations (default: 1000)

adapt_delta

Target acceptance rate (default: 0.95)

use_qr

Use QR decomposition (default: TRUE)

kappa_threshold

Shrinkage threshold for selection (default: 0.5)

pca_config

List of configuration for PCA:

n_components: Number of components (default: NULL for auto)

rotation

Rotation method: "none", "varimax", "oblimin" (default: "none")

n_boot

Bootstrap replications (default: 1000)

block_length

Block length for bootstrap (default: NULL for auto)

alpha

Alpha for bootstrap tests (default: 0.05)

dfm_config

List of configuration for Dynamic Factor Models:

r: Number of factors (default: NULL for auto via IC)

max_factors

Maximum factors to consider (default: 10)

VAR lags for factor dynamics (default: 1)

Information criterion: "IC1", "IC2", "IC3" (default: "bai_ng_2")

unitroot_tests

Character vector of unit root tests to apply. Options: "adf", "ers", "kpss", "pp", or "all" (default).

na_action

How to handle missing values: "interpolate", "omit", "fail" (default: "interpolate").

standardize

Logical, whether to standardize variables before analysis (default: TRUE).

first_difference

Logical, whether to first-difference data (default: FALSE).

verbose

Logical, whether to print progress messages (default: TRUE).

seed

Random seed for reproducibility (default: NULL).

Details

Methodological Framework

The signal extraction pipeline distinguishes between latent structure (the underlying data-generating process) and phenomenological dynamics (observed variability). This is achieved through:

Spectral Decomposition: Separates signal frequencies
- Wavelets: Multi-resolution analysis via MODWT
- EMD: Data-adaptive decomposition into intrinsic modes
- HP-GC: Bayesian unobserved components (trend + cycle)
Sparse Regression: Identifies relevant predictors
- Regularized Horseshoe: Adaptive shrinkage with slab regularization
- Shrinkage factors (kappa) quantify predictor relevance
Dimensionality Reduction: Extracts common factors
- PCA: Static factor structure with bootstrap significance
- DFM: Dynamic factors with VAR transition dynamics
Stationarity Testing: Characterizes persistence properties
- Integrated battery of ADF, ERS, KPSS, PP tests
- Synthesized conclusion on stationarity type

Interpretation Framework

The automated interpretation assesses:

Signal Smoothness: Variance of second differences
Trend Persistence: Deterministic vs. stochastic via unit roots
Information Topology: Entropy of PC1 loadings (concentrated vs. diffuse)
Sparsity Ratio: Proportion of predictors shrunk to zero
Factor Structure: Number of significant common factors

References

Piironen, J., & Vehtari, A. (2017). Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11(2), 5018-5051. tools:::Rd_expr_doi("10.1214/17-EJS1337SI")

Bai, J., & Ng, S. (2002). Determining the Number of Factors in Approximate Factor Models. Econometrica, 70(1), 191-221. tools:::Rd_expr_doi("10.1111/1468-0262.00273")

Examples

Run this code

# \donttest{
# Generate example panel data
set.seed(42)
n_time <- 50   
n_vars <- 10   

# Create correlated predictors with common factor structure
factors <- matrix(rnorm(n_time * 2), n_time, 2)
loadings <- matrix(runif(n_vars * 2, -1, 1), n_vars, 2)
X <- factors %*% t(loadings) + matrix(rnorm(n_time * n_vars, 0, 0.5), n_time, n_vars)
colnames(X) <- paste0("X", 1:n_vars)

# True signal depends on only 3 predictors
true_beta <- c(rep(1, 3), rep(0, 7))
Y <- X %*% true_beta + rnorm(n_time, 0, 0.5)

# Combine into data frame
data <- data.frame(Y = Y, X)

# Run comprehensive analysis
# We pass specific configs to make MCMC very fast just for the example
result <- signal_analysis(
  data = data,
  y_formula = "Y",
  methods = "all",
  verbose = TRUE,
  # Configuration for speed (CRAN policy < 5s preferred)
  filter_config = list(
     hpgc_chains = 1,      
     hpgc_iterations = 50, 
     hpgc_burnin = 10
  ),
  horseshoe_config = list(
     chains = 1,           
     iter_sampling = 50,   
     iter_warmup = 10
  ),
  pca_config = list(
     n_boot = 50           
  )
)

# View interpretation
print(result)

# Plot results
plot(result)
# }

Run the code above in your browser using DataLab