EKC: Empirical Kaiser Criterion

Description

The empirical Kaiser criterion incorporates random sampling variations of the eigenvalues from the Kaiser-Guttman criterion (KGC; see Auerswald & Moshagen , 2019; Braeken & van Assen, 2017). The code is based on Braeken & van Assen, (2017) and on Auerswald and Moshagen (2019).

Usage

EKC(
  x,
  N = NA,
  use = c("pairwise.complete.obs", "all.obs", "complete.obs", "everything",
    "na.or.complete"),
  cor_method = c("pearson", "spearman", "kendall"),
  type = "BvA2017"
)

Value

A list of class EKC containing

eigenvalues: A vector containing the eigenvalues found on the correlation matrix of the entered data.
n_factors_BvA2017: The number of factors to retain according to the original empirical Kaiser criterion by Braeken and van Assen (2017).
n_factors_AM2019: The number of factors to retain according to the adapted empirical Kaiser criterion by Auerswald and Moshagen (2019).
references: The reference eigenvalues.
settings: A list with the settings used.

Arguments

x: data.frame or matrix. data.frame or matrix of raw data or matrix with correlations.
N: numeric. The number of observations. Only needed if x is a correlation matrix.
use: character. Passed to stats::cor if raw data is given as input. Default is "pairwise.complete.obs".
cor_method: character. Passed to stats::cor. Default is "pearson".
type: character. The calculation of EKC. type "BvA2017" is the original implementation; type "AM2019" differs from the original implementation but was used in simulation studies (Auerswald & Moshagen, 2019; Caron, 2025). See details. Use type = c("BvA2017", "AM2019") for both implementations. Make sure to report which version you used.

Details

The Kaiser-Guttman criterion was defined with the intend that a factor should only be extracted if it explains at least as much variance as a single factor (see KGC). However, this only applies to population-level correlation matrices. Due to sampling variation, the KGC strongly overestimates the number of factors to retrieve (e.g., Zwick & Velicer, 1986). To account for this and to introduce a factor retention method that performs well with small number of indicators and correlated factors (cases where the performance of parallel analysis, see PARALLEL, is known to deteriorate) Braeken and van Assen (2017) introduced the empirical Kaiser criterion in which a series of reference eigenvalues is created as a function of the variables-to-sample-size ratio and the observed eigenvalues.

Braeken and van Assen (2017) showed that "(a) EKC performs about as well as parallel analysis for data arising from the null, 1-factor, or orthogonal factors model; and (b) clearly outperforms parallel analysis for the specific case of oblique factors, particularly whenever factor intercorrelation is moderate to high and the number of variables per factor is small, which is characteristic of many applications these days" (p.463-464).

In EFAtools version <= 0.5.0 only the implementation of Auerswald and Moshagen (2019) was implemented (now available with type = "AM2019"). However, this implementation, that was probably also used in Caron (2025), differs from the original implementation by Braeken and van Assen (2017) in that it corrects by the reference values, i.e., without using the empirical eigenvalues used in the original implementation. Thanks to Luis Eduardo Garrido for pointing this out and to Johan Braeken for sharing sample code, based on which the original version is now implemented and used by default with type = "BvA2017".

While the adapted version performed relatively well in the simulation studies by Auerswald and Moshagen (2019) and Caron (2025), the theoretical derivations of the EKC as introduced by Braeken and van Assen (2017) may no longer hold. Currently we are unaware of studies comparing the two implementations, but based on our own brief comparisons across multiple datasets, the two implementations appear to often differ substantially regarding the number of factors suggested.

As both implementations exist in different packages and studies, we provide both versions here. Be sure to state clearly which version you use when reporting your results to avoid confusion and ensure reproducibility.

The EKC function can also be called together with other factor retention criteria in the N_FACTORS function.

Examples

Run this code

# original implementation
EKC(test_models$baseline$cormat, N = 500)

# original and adapted implementation
EKC(test_models$baseline$cormat, N = 500, type = c("BvA2017", "AM2019"))