Two usage styles:
# Function interface
determine_factors(returns, max_m = 5)# R6 method interface
tv <- TVMVP$new()
tv$set_data(returns)
tv$determine_factors(max_m = 5)
tv$get_optimal_m()
tv$get_IC_values()
When using the method form, if `max_m` or `bandwidth` are omitted,
they default to values stored in the object. Results are cached and
retrievable via class methods.
For each candidate number of factors \(m\) (from 1 to max_m), the function:
Performs a local PCA on the returns at each time point \(r = 1,\dots,T\) using \(m\) factors.
Computes a reconstruction of the returns and the corresponding residuals:
$$\text{Residual}_r = R_r - F_r \Lambda_r,$$
where \(R_r\) is the return at time \(r\), and \(F_r\) and \(\Lambda_r\) are the local factors and loadings, respectively.
Computes the average sum of squared residuals (SSR) as:
$$V(m) = \frac{1}{pT} \sum_{r=1}^{T} \| \text{Residual}_r \|^2.$$
Adds a penalty term that increases with \(R\):
$$\text{Penalty}(m) = m × \frac{(p + T × \text{bandwidth})}{(pT × \text{bandwidth})} \log\left(\frac{pT × \text{bandwidth}}{(p + T × \text{bandwidth})}\right).$$
The information criterion is defined as:
$$\text{IC}(m) = \log\big(V(m)\big) + \text{Penalty}(m).$$
The optimal number of factors is then chosen as the value of \(m\) that minimizes \(\text{IC}(m)\).