AMFA.inc: Incremental Automated Mixtures of Factor Analyzers

Description

An alternative implementation of AMFA algorithm WangWan-Lun2020AlomautoMFA. The number of factors, q, is estimated during the fitting process of each MFA model. Instead of employing a grid search over g like the AMFA method, this method starts with a 1 component MFA model and splits components according to their multivariate kurtosis. This uses the same approach as amofa kaya2015adaptiveautoMFA. Once a component has been selected for splitting, the new components are initialised in the same manner as vbmfa ghahramani2000variationalautoMFA. It keeps trying to split components until all components have had numTries splits attempted with no decrease in BIC, after which the current model is returned.

Usage

AMFA.inc(
  Y,
  numTries = 2,
  eta = 0.005,
  itmax = 500,
  tol = 1e-05,
  conv_measure = "diff",
  nkmeans = 1,
  nrandom = 1,
  varimax = FALSE
)

Arguments

An n by p data matrix, where n is the number of observations and p is the number of dimensions of the data.

numTries

The number of attempts that should be made to split each component.

eta

The smallest possible entry in any of the error matrices D_i Jian-HuaZhao2008FMEfautoMFA.

itmax

The maximum number of ECM iterations allowed for the estimation of each MFA model.

tol

The ECM algorithm terminates if the measure of convergence falls below this value.

conv_measure

The convergence criterion of the ECM algorithm. The default 'diff' stops the ECM iterations if |l^(k+1) - l^(k)| < tol where l^(k) is the log-likelihood at the kth ECM iteration. If 'ratio', then the convergence of the ECM iterations is measured using |(l^(k+1) - l^(k))/l^(k+1)|.

nkmeans

The number of times the k-means algorithm will be used to initialise the (single component) starting models.

nrandom

The number of randomly initialised (single component) starting models.

varimax

Boolean indicating whether the output factor loading matrices should be constrained using varimax rotation or not.

Value

A list containing the following elements:

model: A list specifying the final MFA model. This contains:
- B: A p by p by q array containing the factor loading matrices for each component.
- D: A p by p by g array of error variance matrices.
- mu: A p by g array containing the mean of each cluster.
- pivec: A 1 by g vector containing the mixing proportions for each FA in the mixture.
- numFactors: A 1 by g vector containing the number of factors for each FA.
clustering: A list specifying the clustering produced by the final model. This contains:
- responsibilities: A n by g matrix containing the probability that each point belongs to each FA in the mixture.
- allocations: A n by 1 matrix containing which FA in the mixture each point is assigned to based on the responsibilities.
diagnostics: A list containing various pieces of information related to the fitting process of the algorithm. This contains:
- bic: The BIC of the final model.
- logL: The log-likelihood of the final model.
- totalTime: The total time taken to fit the final model.

References

WangWan-Lun2020AlomautoMFA

kaya2015adaptiveautoMFA

ghahramani2000variationalautoMFA

Jian-HuaZhao2008FMEfautoMFA

Examples

Run this code

# NOT RUN {
RNGversion('4.0.3'); set.seed(3) 
MFA.fit <- AMFA.inc(autoMFA::MFA_testdata, itmax = 1, numTries = 0)
# }