fa.parallel(x, n.obs = NULL,fm="minres", fa="both",
main = "Parallel Analysis Scree Plots", n.iter=20,error.bars=FALSE,
SMC=FALSE,ylabel=NULL,show.legend=TRUE,sim=TRUE,cor="cor",use="pairwise")
fa.parallel.poly(x ,n.iter=10,SMC=TRUE, fm = "minres",correct=TRUE,sim=FALSE,
fa="both",global=TRUE)
## S3 method for class 'poly.parallel':
plot(x,show.legend=TRUE,fa="both",...)
fa
for details.VSS
) and Velicer's MAP
procedure (included in VSS
). Both the VSS and the MAP criteria are included in the link{nfactors}
function which also reports the mean item complexity and the BIC for each of multiple solutions. fa.parallel plots the eigen values for a principal components and the factor solution (minres by default) and does the same for random matrices of the same size as the original data matrix. For raw data, the random matrices are 1) a matrix of univariate normal data and 2) random samples (randomized across rows) of the original data.fa.parallel.poly
will do parallel analysis for polychoric and tetrachoric factors. If the data are dichotomous, fa.parallel.poly
will find tetrachoric correlations for the real and simulated data, otherwise, if the number of categories is less than 10, it will find polychoric correlations. Note that fa.parallel.poly is slower than fa.parallel because of the complexity of calculating the tetrachoric/polychoric correlations.
fa.parallel
now will do tetrachorics or polychorics directly if the cor option is set to "tet" or "poly". As with fa.parallel.poly
this will take longer.
The means of (ntrials) random solutions are shown. Error bars are usually very small and are suppressed by default but can be shown if requested. If the sim option is set to TRUE (default), then parallel analyses are done on resampled data as well as random normal data. In the interests of speed, the parallel analyses are done just on resampled data if sim=FALSE. Both procedures tend to agree.
Alternative ways to estimate the number of factors problem are discussed in the Very Simple Structure (Revelle and Rocklin, 1979) documentation (VSS
) and include Wayne Velicer's MAP
algorithm (Veicer, 1976).
Parallel analysis for factors is actually harder than it seems, for the question is what are the appropriate communalities to use. If communalities are estimated by the Squared Multiple Correlation (SMC) smc
, then the eigen values of the original data will reflect major as well as minor factors (see sim.minor
to simulate such data). Random data will not, of course, have any structure and thus the number of factors will tend to be biased upwards by the presence of the minor factors.
By default, fa.parallel estimates the communalities based upon a one factor minres solution. Although this will underestimate the communalities, it does seem to lead to better solutions on simulated or real (e.g., the bfi
or Harman74) data sets.
For comparability with other algorithms (e.g, the paran function in the paran package), setting smc=TRUE will use smcs as estimates of communalities. This will tend towards identifying more factors than the default option.
Printing the results will show the eigen values of the original data that are greater than simulated values. A sad observation about parallel analysis is that it is sensitive to sample size. That is, for large data sets, the eigen values of random data are very close to 1. This will lead to different estimates of the number of factors as a function of sample size. Consider factor structure of the bfi data set (the first 25 items are meant to represent a five factor model). For samples of 200 or less, parallel analysis suggests 5 factors, but for 1000 or more, six factors and components are indicated. This is not due to an instability of the eigen values of the real data, but rather the closer approximation to 1 of the random data as n increases. When simulating dichotomous data in fa.parallel.poly, the simulated data have the same difficulties as the original data. This functionally means that the simulated and the resampled results will be very similar. As with many psych functions, fa.parallel has been changed to allow for multicore processing. For running a large number of iterations, it is obviously faster to increase the number of cores to the maximum possible (using the options("mc.cores=n) command where n is determined from detectCores().
Horn, John (1965) A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179-185.
Humphreys, Lloyd G. and Montanelli, Richard G. (1975), An investigation of the parallel analysis criterion for determining the number of common factors. Multivariate Behavioral Research, 10, 193-205.
Revelle, William and Rocklin, Tom (1979) Very simple structure - alternative procedure for estimating the optimal number of interpretable factors. Multivariate Behavioral Research, 14(4):403-414.
Velicer, Wayne. (1976) Determining the number of components from the matrix of partial correlations. Psychometrika, 41(3):321-327, 1976.
fa
, VSS
, VSS.plot
, VSS.parallel
, sim.minor
#test.data <- Harman74.cor$cov #The 24 variable Holzinger - Harman problem
#fa.parallel(test.data,n.obs=145)
fa.parallel(Thurstone,n.obs=213) #the 9 variable Thurstone problem
#set.seed(123)
#minor <- sim.minor(24,4,400) #4 large and 12 minor factors
#fa.parallel(minor$observed) #shows 5 factors and 4 components -- compare with
#fa.parallel(minor$observed,SMC=FALSE) #which shows 6 and 4 components factors
Run the code above in your browser using DataLab