Learn R Programming

TVMVP (version 1.0.5)

determine_factors: Determine the Optimal Number of Factors via an Information Criterion

Description

This function selects the optimal number of factors for a local principal component analysis (PCA) model of asset returns. It computes an BIC-type information criterion (IC) for each candidate number of factors, based on the sum of squared residuals (SSR) from the PCA reconstruction and a penalty term that increases with the number of factors. The optimal number of factors is chosen as the one that minimizes the IC. The procedure is available either as a stand-alone function or as a method in the `TVMVP` R6 class.

Usage

determine_factors(returns, max_m, bandwidth = silverman(returns))

Value

A list with:

  • optimal_m: Integer. The optimal number of factors.

  • IC_values: Numeric vector of IC values for each candidate \(m\).

Arguments

returns

A numeric matrix of asset returns with dimensions \(T \times p\).

max_m

Integer. The maximum number of factors to consider.

bandwidth

Numeric. Kernel bandwidth for local PCA. Default is Silverman's rule of thumb.

Details

Two usage styles:


# Function interface
determine_factors(returns, max_m = 5)

# R6 method interface tv <- TVMVP$new() tv$set_data(returns) tv$determine_factors(max_m = 5) tv$get_optimal_m() tv$get_IC_values()

When using the method form, if `max_m` or `bandwidth` are omitted, they default to values stored in the object. Results are cached and retrievable via class methods.

For each candidate number of factors \(m\) (from 1 to max_m), the function:

  1. Performs a local PCA on the returns at each time point \(r = 1,\dots,T\) using \(m\) factors.

  2. Computes a reconstruction of the returns and the corresponding residuals: $$\text{Residual}_r = R_r - F_r \Lambda_r,$$ where \(R_r\) is the return at time \(r\), and \(F_r\) and \(\Lambda_r\) are the local factors and loadings, respectively.

  3. Computes the average sum of squared residuals (SSR) as: $$V(m) = \frac{1}{pT} \sum_{r=1}^{T} \| \text{Residual}_r \|^2.$$

  4. Adds a penalty term that increases with \(R\): $$\text{Penalty}(m) = m × \frac{(p + T × \text{bandwidth})}{(pT × \text{bandwidth})} \log\left(\frac{pT × \text{bandwidth}}{(p + T × \text{bandwidth})}\right).$$

  5. The information criterion is defined as: $$\text{IC}(m) = \log\big(V(m)\big) + \text{Penalty}(m).$$

The optimal number of factors is then chosen as the value of \(m\) that minimizes \(\text{IC}(m)\).

References

Su, L., & Wang, X. (2017). On time-varying factor models: Estimation and testing. Journal of Econometrics, 198(1), 84–101.

Examples

Run this code
set.seed(123)
returns <- matrix(rnorm(100 * 30), nrow = 100, ncol = 30)

# Function usage
result <- determine_factors(returns, max_m = 5)
print(result$optimal_m)
print(result$IC_values)

# R6 usage
tv <- TVMVP$new()
tv$set_data(returns)
tv$determine_factors(max_m = 5)
tv$get_optimal_m()
tv$get_IC_values()

Run the code above in your browser using DataLab