Factanal: Estimate Factor Analysis Models

Description

This function is intended for users and estimates a factor analysis model that has been set up previously with a call to make_manifest and a call to make_restrictions.

Usage

Factanal(manifest, restrictions, scores = "none", seeds = 12345, 
lower = sqrt(.Machine$double.eps), analytic = TRUE, reject = TRUE, 
NelderMead = TRUE, impatient = FALSE, ...)

Arguments

manifest

An object that inherits from manifest-class and is typically produced by make_manifest.

restrictions

An object that inherits from restrictions-class and is typically produced by make_restrictions.

scores

Type of factor scores to produce, if any. The default is "none". Other valid choices (which can be partially matched) are "regression", "Bartlett", "Thurstone", "Ledermann", "Anderson-Rubin", "McDonald","Krinjen", "Takeuchi", and "Harman". See Beauducel (2007) for formulae for these factor scores as well as proofs that all but "regression" and "Harman" produce the same correlation matrix.

seeds

A vector of length one or two to be used as the random number generator seeds corresponding to the unif.seed and int.seed arguments to genoud respectively. If seeds is a single number, this seed is used for both unif.seed and int.seed. These seeds override the defaults for genoud and make it easier to replicate an analysis exactly. If NULL, the default arguments for unif.seed and int.seed as specified in genoud are used. NULL should be used in simulations or else they will be horribly wrong.

lower

A lower bound. In exploratory factor analysis, lower is the minimum uniqueness and corresponds to the 'lower' element of the list specified for control in factanal. Otherwise, lower is the lower bound used for singular values when checking for positive-definiteness and ranks of matrices. If the unlikely event that you get errors referencing positive definiteness, try increasing the value of lower slightly.

analytic

A logical (default to TRUE) indicating whether analytic gradients should be used as much as possible. If FALSE, then numeric gradients will be calculated, which are slower and slightly less accurate but are necessary in some situations and useful for debugging analytic gradients.

reject

Logical indicating whether to reject starting values that fail the constraints required by the model; see create_start

NelderMead

Logical indicating whether to call optim with method = "Nelder-Mead" when the genetic algorithm has finished to further polish the solution. This option is not relevant or necessary for exploratory factor analysis models.

impatient

Logical that defaults to FALSE. If restrictions is of restrictions.factanal-class, setting it to TRUE will cause factanal to be used for optimization instead of genoud. In all other situations, setting it to TRUE will use factanal to to generate initial communality estimates instead of the slower default mechanism.

...

Further arguments that are passed to genoud. The following arguments to genoud are hard-coded and cannot be changed because they are logically required by the factor analyis estimator:

argument	value
why?	`nvars`
`restrictions@nvars`
`max`	`FALSE`
minimizing the objective	`hessian`
`FALSE`	we roll our own
`lexical`	`TRUE` (usually)
for restricted optimization	`Domains`
`restrictions@Domains`
`data.type.int`	`FALSE`
parameters are doubles	`fn`
wrapper around `fitS4`
`BFGSfn`	wrapper around `bfgs_fitS4`
	`BFGShelp`
wrapper around `bfgs_helpS4`
`gr`	various
it is complicated	`unif.seed`
taken from `seeds`	replicability

The following arguments to genoud default to values that differ from those documented at genoud but can be overridden by specifying them explicitly in the ... :

argument	value
why?	`boundary.enforcement`
$1$ usually	$2$ can cause problems
`MemoryMatrix`	`FALSE`
runs faster	`print.level`
$1$	output is not that helpful for $>= 2$
`P9mix`	$1$
to always accept the BFGS result	`BFGSburnin`
$-1$	to delay the gradient check
`max.generations`	$1000$
big number is often necessary	`project.path`
contains `"Factanal.txt"`

Other arguments to genoud will take the documented default values unless explicitly specified. In particular, you may want to change wait.generations and solution.tolerance. Also, if informative bounds were placed on any of the parameters in the call to make_restrictions it is usually preferable to specify that boundary.enforcement = 2 to use constrained optimization in the internal calls to optim. However, the "L-BFGS-B" optimizer is less robust than the default "BFGS" optimizer and occasionally causes fatal errors, largly due to misfortune.

Value

An object of that inherits from FA-class.

Details

The call to Factanal is somewhat of a formality in the sense that most of the difficult decisions were already made in the call to make_restrictions and the call to make_manifest. The most important remaining detail is the specification of the values for the starting population in the genetic algorithm.

It is not necessary to provide starting values, since there are methods for this purpose; see create_start. Also, if starting.values = NA, then a population of starting values will be created using the typical mechanism in genoud, namely random uniform draws from the domain of the parameter.

Otherwise, if reject = TRUE, starting values that fail one or more constraints are rejected and new vectors of starting values are generated until the population is filled with admissable starting values. In some cases, the constraints are quite difficult to satisfy by chance, and it may be more practical to specify reject = FALSE or to supply starting values explicitly. If starting values are supplied, it is helpful if at least one member of the genetic population satisfies all the constraints imposed on the model. Note the rownames of restrictions@Domains, which indicate the proper order of the free parameters.

A matrix (or vector) of starting values can be passed as starting.values. (Also, it is possible to pass an object of FA-class to starting.values, in which case the estimates from the previous call to Factanal are used as the starting values.) If a matrix, it should have columns equal to the number of rows in restrictions@Domains in the specified order and one or more rows up to the number of genetic individuals in the population.

If starting.values is a vector, its length can be equal to the number of rows in restrictions@Domains in which case it is treated as a one-row matrix, or its length can be equal to the number of manifest variables, in which case it is passed to the start argument of create_start as a vector of initial communality estimes, thus avoiding the sometimes time-consuming process of generating good initial communality estimates. This process can also be accelerated by specifying impatient = TRUE.

References

Barthlomew, D. J. and Knott, M. (1990) Latent Variable Analysis and Factor Analysis. Second Edition, Arnold.

Beauducel, A. (2007) In spite of indeterminancy, many common factor score estimates yield an identical reproduced covariance matrix. Psychometrika, 72, 437--441.

Smith, G. A. and Stanley G. (1983) Clocking $g$: relating intelligence and measures of timed performance. Intelligence, 7, 353--368.

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Examples

Run this code

## Example from Venables and Ripley (2002, p. 323)
## Previously from Bartholomew and Knott  (1999, p. 68--72)
## Originally from Smith and Stanley (1983)
## Replicated from example(ability.cov)

man <- make_manifest(covmat = ability.cov)

## Not run: 
# ## Here is the easy way to set up a SEFA model, which uses pop-up menus
# res <- make_restrictions(manifest = man, factors = 2, model = "SEFA")
# ## End(Not run)

## This is the hard way to set up a restrictions object without pop-up menus
beta <- matrix(NA_real_, nrow = nrow(cormat(man)), ncol = 2)
rownames(beta) <- rownames(cormat(man))
free <- is.na(beta)
beta <- new("parameter.coef.SEFA", x = beta, free = free, num_free = sum(free))

Phi  <- diag(2)
free <- lower.tri(Phi)
Phi  <- new("parameter.cormat", x = Phi, free = free, num_free = sum(free))
res  <- make_restrictions(manifest = man, beta = beta, Phi = Phi)

# This is how to make starting values where Phi is the correlation matrix 
# among factors, beta is the matrix of coefficients, and the scales are
# the logarithm of the sample standard deviations. It is also the MLE.
starts <- c( 4.46294498156615e-01, #  Phi_{21}
             4.67036349420035e-01, # beta_{11}
             6.42220238211291e-01, # beta_{21}
             8.88564379236454e-01, # beta_{31}
             4.77779639176941e-01, # beta_{41}
            -7.13405536379741e-02, # beta_{51}
            -9.47782525342137e-08, # beta_{61}
             4.04993872375487e-01, # beta_{12}
            -1.04604290549591e-08, # beta_{22}
            -9.44950629176182e-03, # beta_{32}
             2.63078925240678e-04, # beta_{42}
             9.38038168787216e-01, # beta_{52}
             8.43618801925473e-01, # beta_{62}
             log(man@sds))         # log manifest standard deviations

sefa <- Factanal(manifest = man, restrictions = res, 
                 # NOTE: Do NOT specify any of the following tiny values in a  
                 # real research situation; it is done here solely for speed
                 starting.values = starts, pop.size = 2, max.generations = 6,
                 wait.generations = 1)
nsim <- 101 # number of simulations, also too small for real work
show(sefa)
summary(sefa, nsim = nsim)
model_comparison(sefa, nsim = nsim)

stuff <- list() # output list for various methods
stuff$model.matrix <- model.matrix(sefa) # sample correlation matrix
stuff$fitted <- fitted(sefa, reduced = TRUE) # reduced covariance matrix
stuff$residuals <- residuals(sefa) # difference between model.matrix and fitted
stuff$rstandard <- rstandard(sefa) # normalized residual matrix
stuff$weights <- weights(sefa) # (scaled) approximate weights for residuals
stuff$influence <- influence(sefa) # weights * residuals
stuff$cormat <- cormat(sefa,  matrix = "RF") # reference factor correlations
stuff$uniquenesses <- uniquenesses(sefa, standardized = FALSE) # uniquenesses
stuff$FC <- loadings(sefa, matrix = "FC") # factor contribution matrix
stuff$draws <- FA2draws(sefa, nsim = nsim) # draws from sampling distribution

if(require(nFactors)) screeplot(sefa)  # Enhanced scree plot
profile(sefa) # profile plots of non-free parameters
pairs(sefa) # Thurstone-style plot
if(require(Rgraphviz)) plot(sefa) # DAG

Run the code above in your browser using DataLab