speci.factors: Criteria on the number of common factors

Description

Determines the number of factors in an approximate factor model for a data panel, where both dimensions $(T \times KN)$ are large, and calculates the factor time series and corresponding list of $N$ idiosyncratic components. See Corona et al. (2017) for an overview and further details.

Usage

speci.factors(
  L.data,
  k_max = 20,
  n.iterations = 4,
  differenced = FALSE,
  centered = FALSE,
  scaled = FALSE,
  n.factors = NULL
)

Value

A list of class 'speci', which contains the elements:

eigenvals: Data frame. The eigenvalues of the PCA, which have been used to calculate the criteria, and their respective share on the total variance in the data panel.
Ahn: Matrix. The eigenvalue ratio $ER(k)$ and growth rate $GR(k)$ by Ahn, Horenstein (2013) for $k=0,\ldots,$k_max factors.
Onatski: Matrix. The calibrated threshold $\delta$ and suggested number of factors $\hat{r}(\delta)$ by Onatski (2010) for each iteration.
Bai: Array. The values of the criteria $PC(k)$, $IC(k)$, and $IPC(k)$ with penalty weights $p1$, $p2$, and $p3$ for $k=0,\ldots,$k_max factors.
selection: List of the optimal number of common factors: (1) A matrix of $k^*$ which minimizes each information criterion with each penalty weight. (2) A vector of $k^*$ which maximizes ER and GR respectively. ED denotes the result by Onatski's (2010) "edge distribution" after convergence.
Ft: Matrix. The common factors of dimension $(T \times$ n.factors) estimated by PCA.
LAMBDA: Matrix. The loadings of dimension $(KN \times$ n.factors) estimated by OLS.
L.idio: List of $N$ data.frame objects each collecting the $K_i$ idiosyncratic series $\hat{e}_{it}$ along the rows $t=1,\ldots,T$. The series $\hat{e}_{it}$ are given in levels and may contain a deterministic component with (1) the initial $\hat{e}_{i1}$ being non-zero and (2) re-accumulated means of the the first-differenced series.
args_speci: List of characters and integers indicating the specifications that have been used.

Arguments

L.data: List of $N$ data.frame objects each collecting the $K_i$ time series along the rows $t=1,\ldots,T$. The $\sum_{i=1}^{N} K_i = NK$ time series are immediately combined into the $T \times KN$ data panel X.
k_max: Integer. The maximum number of factors to consider.
n.iterations: Integer. Number of iterations for the Onatski criterion.
differenced: Logical. If TRUE, each time series of the panel X is first-differenced prior to any further transformation. Thereby, all criteria are calculated as outlined by Corona et al. (2017).
centered: Logical. If TRUE, each time series of the panel X is centered.
scaled: Logical. If TRUE, each time series of the panel X is scaled. Thereby, the PCA is applied via the correlation matrix instead of the covariance matrix of X.
n.factors: Integer. A presumed number of factors under which the idiosyncratic component L.idio is calculated. Deactivated if NULL (the default).

Details

If differenced is TRUE, the approximate factor model is estimated as proposed by Bai, Ng (2004). If all data transformations are selected, the estimation results are identical to the objects in $CSD for PANIC analyses in 'pcoint' objects.

References

Ahn, S., and Horenstein, A. (2013): "Eigenvalue Ratio Test for the Number of Factors", Econometrica, 81, pp. 1203-1227.

Bai, J. (2004): "Estimating Cross-Section Common Stochastic Trends in Nonstationary Panel Data", Journal of Econometrics, 122, pp. 137-183.

Bai, J., and Ng, S. (2002): "Determining the Number of Factors in Approximate Factor Models", Econometrica, 70, pp. 191-221.

Bai, J., and Ng, S. (2004): "A PANIC Attack on Unit Roots and Cointegration", Econometrica, 72, pp. 1127-117.

Corona, F., Poncela, P., and Ruiz, E. (2017): "Determining the Number of Factors after Stationary Univariate Transformations", Empirical Economics, 53, pp. 351-372.

Onatski, A. (2010): "Determining the Number of Factors from Empirical Distribution of Eigenvalues", Review of Econometrics and Statistics, 92, pp. 1004-1016.

Examples

Run this code

### reproduce Oersal,Arsova 2017:67, Ch.5 ###
data("MERM")
names_k = colnames(MERM)[-(1:2)] # variable names
names_i = levels(MERM$id_i)      # country names
L.data  = sapply(names_i, FUN=function(i) 
   ts(MERM[MERM$id_i==i, names_k], start=c(1995, 1), frequency=12), 
   simplify=FALSE)

R.fac1 = speci.factors(L.data, k_max=20, n.iterations=4)
R.fac0 = speci.factors(L.data, k_max=20, n.iterations=4, 
   differenced=TRUE, centered=TRUE, scaled=TRUE, n.factors=8)
   
# scree plot #
library("ggplot2")
pal = c("#999999", RColorBrewer::brewer.pal(n=8, name="Spectral"))
lvl = levels(R.fac0$eigenvals$scree)
F.scree = ggplot(R.fac0$eigenvals[1:20, ]) +
  geom_col(aes(x=n, y=share, fill=scree), color="black", width=0.75) +
  scale_fill_manual(values=pal, breaks=lvl, guide="none") +
  labs(x="Component number", y="Share on total variance", title=NULL) +
  theme_bw()
plot(F.scree)

# factor plot (comp. Oersal,Arsova 2017:71, Fig.4) #
library("ggfortify")
Ft = ts(R.fac0$Ft, start=c(1995, 1), frequency=12)
F.factors = autoplot(Ft, facets=FALSE, size=1.5) + 
  scale_color_brewer(palette="Spectral") +
  labs(x="Year", y=NULL, color="Factor", title=NULL) +
  theme_bw()
plot(F.factors)

Run the code above in your browser using DataLab