Learn R Programming

SignalY (version 1.1.1)

pca_bootstrap: Principal Component Analysis with Bootstrap Significance Testing

Description

Performs PCA on panel data with bootstrap-based significance testing for factor loadings. Identifies which variables load significantly on each principal component using a null distribution constructed via block bootstrapping.

Usage

pca_bootstrap(
  X,
  n_components = NULL,
  center = TRUE,
  scale = TRUE,
  n_boot = 200,
  block_length = NULL,
  alpha = 0.05,
  use_fdr = FALSE,
  rotation = c("varimax", "none", "oblimin"),
  verbose = FALSE
)

Value

A list of class "signaly_pca" containing:

loadings

Matrix of factor loadings (rotated if specified)

scores

Matrix of component scores

eigenvalues

Vector of eigenvalues

variance_explained

Proportion of variance explained by each component

cumulative_variance

Cumulative proportion of variance explained

significant_loadings

Matrix of logical values indicating significance

p_values

Matrix of bootstrap p-values for loadings

thresholds

Cutoff values for significance by component

entropy

Shannon entropy of loadings for each component

summary_by_component

Data frame summarizing each component

assignments

Data frame mapping variables to their dominant component

Arguments

X

Matrix or data frame where rows are observations (time points) and columns are variables.

n_components

Number of principal components to extract. If NULL, determined by eigenvalue threshold or explained variance.

center

Logical. Center variables before PCA. Default TRUE.

scale

Logical. Scale variables to unit variance. Default TRUE.

n_boot

Number of bootstrap replications for significance testing. Default 200.

block_length

Block length for block bootstrap. If NULL, defaults to ceiling(sqrt(nrow(X))).

alpha

Significance level for loading tests. Default 0.05.

use_fdr

Logical. Apply Benjamini-Hochberg FDR correction. Default FALSE.

rotation

Character string specifying rotation method: "none", "varimax", or "oblimin". Default "varimax".

verbose

Logical for progress messages.

Interpretation in Signal Analysis

  • High PC1 entropy: "Maximum entropy systemic stochasticity" - the dominant factor captures undifferentiated movement, suggesting noise rather than latent structure.

  • Low PC1 entropy: "Differentiated latent structure" - specific variables dominate, indicating meaningful groupings.

  • Significant loadings: Variables with p < alpha after bootstrap testing reliably load on that component.

Details

The analysis proceeds in several stages:

1. Standard PCA: Eigendecomposition of the correlation (if scaled) or covariance matrix to extract principal components.

2. Rotation (optional): Varimax rotation maximizes the variance of squared loadings within components, producing cleaner simple structure. Oblimin allows correlated factors.

3. Bootstrap Significance Testing: For each bootstrap replicate:

  1. Resample rows using block bootstrap (preserving temporal dependence)

  2. Perform PCA on resampled data

  3. Apply Procrustes rotation to align with original

  4. Record absolute loadings

The empirical p-value for each loading is the proportion of bootstrap loadings exceeding the original in absolute value.

4. Entropy Calculation: Shannon entropy of squared loadings indicates whether explanatory power is concentrated (low entropy) or diffuse (high entropy). High entropy on PC1 suggests systemic co-movement rather than differentiated structure.

References

Jolliffe, I. T. (2002). Principal Component Analysis (2nd ed.). Springer.

Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23(3), 187-200.

Examples

Run this code
set.seed(123)
n <- 100
p <- 10
X <- matrix(rnorm(n * p), ncol = p)
colnames(X) <- paste0("V", 1:p)
result <- pca_bootstrap(X, n_components = 3, n_boot = 50)
print(result$summary_by_component)

Run the code above in your browser using DataLab