CFA: Confirmatory Factor Analysis Without Iteration

Description

Factor analysis is a procedure for identifying latent variables thought to account for the correlations or covariances between observed variables. There are two approaches to factor analysis: Exploratory Factor Analysis (e.g., EFA using the fa function) and Confirmatory Factor Analysis (CFA). Perhaps the best way to do Confirmatory Factor Analysis is with the laavan package's cfa function. CFA in psych is a simple and more limited version for those who want to stay within the psych package and take advantage of various psych package options. CFA uses the direct approach (the multiple group method) using the Spearman/Guttman approach as discussed by Dhaene and Rosseel, 2025.

Usage

CFA(model=NULL,r=NULL, all=FALSE, cor = "cor", use ="pairwise", n.obs = NA, 
 orthog=FALSE,  weight=NULL,correct=0, method="regression", 
 missing=FALSE,impute="none",Grice=FALSE)
CFA.bifactor(model=NULL,r,all=FALSE,g=FALSE, cor="cor", use="pairwise", n.obs=NA, 
  orthog=FALSE, weight=NULL, correct=0, method="regression",
   missing=FALSE,impute="none",
  Grice=FALSE )

Arguments

Value

loadings: Factor (Structure) Loadings
Pattern: Factor Pattern coefficients
Phi: Factor correlations
communalities: As estimated using the Spearman/Guttman procedure
dof: Degrees of freedom is the number of original corrlations - number of loadings - number of between factor correlations
stats: as found by fa.stats
scores: Factor scores.
...: Many other statistics as reported by fa
Call: echoes the call to the function

Details

Most EFA and CFA functions use maximum likelihood functions to estimate the coefficients. However, as Maccallum et al. (2007), and Dhaene and Rosseel (2024) point out, ML approaches are not necessarily optimal for finite (e.g., small) samples. Maccallum et al. (2007) discuss why ML fails on some problems that minres procedures do not.

Confirmatory factor analysis may be done without iteration (and thus not using Maximum Likelihood procedures) by using some very old techniques. The algorithm follows that of Dhaene and Rosseel (2024) using the ``Spearman" Multiple Group Method to estimate the communalities. This method was introduced by Guttman (1952) and is discussed by Harman (1967).

CFA follows the Spearman approach for communalities discussed by Dhaene and Rosseel (2024) and described as the ``Multiple Group Method". I use the upper case name (CFA) to avoid conflicts with lavaan's cfa function. Following Harman (1967) (Chapter 7, p 115-117) the communality of each variable is estimated by the ratio of the sum of all the correlations to the sum of squared correlations with that variable. The square root of the communality is the factor loading.

Guttman (1952) points out that a weighting matrix of -1, 0, and 1 times is essentially a regression model where the use of differential weights doesn't make much difference.

CFA.bifactor first does a CFA on all of the variables, and then does another CFA using the model matrix or keys list on the residual correlation matrix. The results are in relatively close agreement with those from lavaan, but are not identical.

To do a "S-1" solution (Eid, 2017; Li and Savalei, 2025) just specify a model with not all variables defined as group factors.

Options for CFA.bifactor include solving the correlations as simple bi-factor model, or as a hierarchical/higher level model using the g=TRUE option. Graphical output in the examples shows the difference of the two approaches.

Guttman and Harman's original method seem to be restricted to positive manifolds and finds the communalities based upon the the correlations.

\(h_i^2 =\frac{(\Sigma r_i)^2 -\Sigma r_i^2}{2(\Sigma r_{i<j}-\Sigma{r_i})} {h_i^2 =(\Sigma r_i)^2 -\Sigma r_i^2}/{2(\Sigma r_{i<j}-\Sigma{r_i})} \)

CFA estimates communalities using the absolute values of r when finding the sums. This allows applying the method to personality data sets such as the bfi or sapa data sets as well as mood data as found in the msqR data sets (sapa and msqR are in psychTools).

If the g parameter is set to true, a hierarchical or second order solution is found by first factoring all the variables for a one factor (g) model and then factoring the residualized matrix using the model based factors. This is shown in the test.hi example.

References

Sara Dhaene and Yves Rosseel, 2024, An Evaluation of non-iterative estimators in confirmatory factor analysis. Structural Equation Modeling (31) 1 1-13. doi: 10.1080/10705511.2023.2187285

Michael Eid, Christian Geiser, Tobias Koch and Moritz Heene (2017) Anomalous Results in G-Factor Models: Explanations and Alternatives. Psychological Methods, 22, 541-562

Guttman. L. (1952) Multiple group methods for common-factor analysis: their basis, computation, and interpretation. Psychometrika, 17, (2) 209-222.

H.H. Harman (1967) Modern Factor Analysis. University of Chicago Press.

Li, Sijia and Savalei, Victoria (2025), Evaluating Statistical Fit of Confirmatory Bifactor Models: Updated Recommendations and a Review of Current Practice. Psychological Methods. doi.org/10.1037/met0000730

MacCallum, Robert C. and Browne, Michael W. and Cai, Li (2007) Factor analysis models as approximations. In Cudeck, Robert and MacCallum, Robert C. (Eds). Factor analysis at 100: Historical developments and future directions. Lawrence Erlbaum Associates Publishers.

Examples

Run this code

 #test set from Harman Table 7.1 P 116
har5 <- structure(c(1, 0.485, 0.4, 0.397, 0.295, 0.485, 1, 0.397, 0.397, 
0.247, 0.4, 0.397, 1, 0.335, 0.275, 0.397, 0.397, 0.335, 1, 0.195, 
0.295, 0.247, 0.275, 0.195, 1), dim = c(5L, 5L), dimnames = list(
    c("V1", "V2", "V3", "V4", "V5"), c("V1", "V2", "V3", "V4", 
    "V5")))

CFA(har5)   #The Harman example.   Note that the model not necessary for the 1 factor case.

CFA(Harman_5)  #the Harman example of a Heywood case

v9 <- sim.hierarchical()  #Create a 3 correlated factor model using default values
model <- 'F1=~ V1 + V2 + V3
          F2=~ V4 + V5 + V6
          F3 =~ V7 +V8 + V9'
CFA(model,v9)


model9 <- 'F1 =~ .9*V1 + .8*V2 + .7*V3
           F2 =~ .8*V4 + .7*V5 +.6*V6
           F3 =~ .7*V7 + .6*V8 +.5*V9
           F1 ~ .6*F2 + .5*F3
           F2 ~  .4*F3'
#An alternative way to create 3 correlated factors
#note that CFA drops the coefficients, the model is for generating the data
 #lavaan does not drop coefficients
v9s <- sim(model9,n=500)
 test <- CFA(model,v9s$observed )  #do a cfa using Lavaan syntax
 test.bi <- CFA.bifactor(model9,v9)
 test.hi <- CFA.bifactor(model9,v9,g=TRUE)

#graphic displays make the output more understandable.
diagram(test)   #show three correlated factors
diagram(test.bi) #show the bifactor solution
diagram(test.hi) #show the hierarchical/higher order solution

#this next example requires psychTools  not run
#for a four factor model using keys

#CFA(psychTools::ability.keys[-1],psychTools::ability,  cor="tet")

CFA(bfi.keys,bfi)   # a five factor model of the bfi items

colnames(Thurstone) <- rownames(Thurstone) <- paste0("x",1:9  )   #to match lavaan syntax
model <- HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '
c3 <- CFA(model,Thurstone,n.obs=213)  #compare with the lavaan solution which has a smaller chi^2 

c3  #show the result	
diagram(c3)  #graphically display the result

c3.hi <- CFA.bifactor(model,Thurstone,n.obs=213)

#do not run the next examples, they require lavaan
#They compare lavaan cfa solutions to CFA

if(FALSE) {
#
#The next examples require lavaan and are thus not run
library(lavaan)               
#The basic lavaan example 
fit <- cfa(model,sample.cov=Thurstone,sample.nobs=213,std.lv=TRUE, estimator="ML")
factor.congruence(fit,c3)  #identical loadings to 2 decimals
round(fit@Model@GLIST$lambda-c3$loadings,4)   
#add the g factor
HS.model <- ' general =~  x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 
              visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '
g.fit <- cfa(HS.model,sample.cov=Thurstone,sample.nobs=213,std.lv=TRUE,orthogonal=TRUE)
fa.congruence(g.fit,c3.hi) #identical congruence to 2 decimals
round(g.fit@Model@GLIST$lambda-c3.hi$loadings,2)  #loadings with ULS are identicla

#All 24 variables from Harman 
harman24 <- psychTools::holzinger.raw[157:301,8:31]
colnames(harman24) <- paste0("v",1:24)
 mod.24<-'g=~v1+v2+v3+v4+v5+v6+v7+v8+v9+v10+v11+v12+v13+v14+v15+v16+v17+v18+v19+v20+v21+v22+v23+v24
 spatial =~ v1 + v2 + v3 + v4 
                verbal=~ v5 + v6 + v7 + v8 + v9
                perceptual =~ v10 + v11 + v12 + v13
                recognition =~ v14+v15 + v16  + v17
                memory =~ v18 + v19 + v20
               '
lav.har.uls <- cfa(mod.24, data=harman24,std.lv=TRUE,std.ov=TRUE, orthogonal=TRUE, estimator="ULS")

lav.har.ml <-cfa(mod.24, data=harman24,std.lv=TRUE,std.ov=TRUE,orthogonal=TRUE)                

 model.har24.5  <- 'spatial =~ v1 + v2 + v3 + v4 
                verbal=~ v5 + v6 + v7 + v8 + v9
                perceptual =~ v10 + v11 + v12 + v13
                recognition =~ v14+v15 + v16  + v17
                memory =~ v18 + v19 + v20'
 cfa.har24 <- CFA(model.har24.5,harman24) 
 cfa.har.bi <- CFA.bifactor(model.har24.5,harman24) 

 factor.congruence(list(lav.har.uls,lav.har.ml,cfa.har.bi)) #g is very good f1-4 very good

round(lav.har@Model@GLIST$lambda-cfa.har.bi$loadings,2)   #not the same
    }

Run the code above in your browser using DataLab