Functions for fitting mixtures of common factor analyzers (MCFA) models. MCFA models are mixture of factor analyzers (belong to the class of multivariate finite mixture models) with a common component matrix for the factor loadings before the transformation of the latent factors to be white noise. It is designed specifically for the task of displaying the observed data points in a lower (q-dimensional) space, where q is the number of factors adopted in the factor-analytic representation of the observed vector.
The mcfa
function fits mixtures common factor analyzers
where the components distributions belong to the family of
multivariate normal distributions.
The mctfa
function fits
mixtures of common t-factor analyzers where
the component distributions corresponds to multivariate
t distributions.
Maximum likelihood estimates of the model parameters are obtained
using the Expectation--Maximization algorithm.
mcfa(Y, g, q, itmax = 500, nkmeans = 5, nrandom = 20,
tol = 1.e-5, init_clust = NULL, init_para = NULL,
init_method = NULL, conv_measure = 'diff',
warn_messages = TRUE, ...)
mctfa(Y, g, q, itmax = 500, nkmeans = 5, nrandom = 20,
tol = 1.e-5, df_init = rep(30, g), df_update = TRUE,
init_clust = NULL, init_para = NULL, init_method = NULL,
conv_measure = 'diff', warn_messages = TRUE, ...)
Object of class c("emmix", "mcfa")
or c("emmix",
"mctfa")
containing the fitted model parameters is returned.
Details of the components are as follows:
Number of mixture components.
Number of factors.
Mixing proportions of the components.
Loading matrix. Size \(p \times q\).
Matrix containing factor means for components in columns. Size \(q \times g\).
Array containing factor covariance matrices for components. Size \(q \times q \times g\).
Error covariance matrix. Size \(p \times p.\)
Estimated conditional expected component scores of the unobservable factors given the data and the component membership. Size \(n \times q \times g\).
Means of the estimated conditional expected factors scores over estimated posterior distributions. Size \(n \times q\).
Alternative estimate of Umean
where the posterior probabilities
for each sample are replaced by component indicator vectors
which contain one in the element corresponding to the highest posterior
probability while others zero. Size \(n \times q\).
Cluster labels.
Posterior probabilities.
Log-likelihood at the convergence.
Bayesian information criterion.
Description of error messages, if any.
A matrix or a data frame of which rows correspond to observations and columns to variables.
Number of components.
Number of factors.
Maximum number of EM iterations.
The number of times the k-means algorithm to be used in partition
the data into g
groups. These groupings are then used in
initializing the parameters for the EM algorithm.
The number of random g
-group partitions for the data to be used
initializing the EM algorithm.
The EM algorithm terminates if the measure of convergence falls below this value.
A vector or matrix consisting of partition of samples to be used in the EM algorithm. For matrix of partitions, columns must corresponds individual partitions of the data. Optional.
A list containing model parameters to be used as initial parameter estimates for the EM algorithm. Optional.
To determine how the initial parameter values are computed. See Details.
The default 'diff'
stops the EM iterations if
|\(l^{(k+1)}\) - \(l^{(k)}\)| < tol
where
\(l^{(k)}\) is the log-likelihood at the \(k\)th EM iteration.
If 'ratio'
, then the convergence of the EM steps is measured
using the |(\(l^{(k+1)}\) - \(l^{(k)}\))/\(l^{(k+1)}\)|.
Initial values of the degree of freedom parameters for mctfa
.
If df_update = TRUE
(default), then the degree of freedom parameters
values will be updated during the EM iterations.
Otherwise, if df_update = FALSE
, they will be fixed at the initial
values specified in df_init
.
With warn_messages = TRUE
(default), the output would
include some description of the reasons where, if any, the model fitting
function failed to provide a fit for a given set of initial
parameter values.
Not used.
Suren Rathnayake, Jangsun Baek, Geoff McLachlan
With init_method = NULL
, the default,
model parameters are initialized using all available methods.
With the init_method = "rand-A"
, the initialization of
the parameters is done using the procedure in
Baek et al. (2010) where initial values for elements of
\(A\) are drawn from the \(N(0, 1)\) distribution.
This method is appropriate when the columns of the data
are on the same scale. The
init_method = "eigen-A"
takes the first \(q\) eigenvectors of \(Y\) as the
initial value for the loading matrix \(A\).
If init_method = "gmf"
then the data are factorized using
gmf
with \(q\) factors and the resulting loading
matrix is used as the initial value for \(A\).
If specified, the optional argument init_para
must be a list or an object of class mcfa
or mctfa
.
When fitting an mcfa
model, only the
model parameters q
, g
,
pivec
, A
, xi
,
omega
, and D
are extracted from
init_para
, while one extra parameter
nu
is extracted when fitting mctfa
.
Everything else in init_para
will be discarded.
Baek J, McLachlan GJ, and Flack LK (2010). Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 2089--2097.
Baek J, and McLachlan GJ (2011). Mixtures of common t-factor analyzers for clustering highdimensional microarray data. Bioinformatics 27, 1269--1276.
McLachlan GJ, Baek J, and Rathnayake SI (2011). Mixtures of factor analyzers for the analysis of high-dimensional data. In Mixture Estimation and Applications, KL Mengersen, CP Robert, and DM Titterington (Eds). Hoboken, New Jersey: Wiley, pp. 171--191.
mfa
, plot_factors
mcfa_fit <- mcfa(iris[, -5], g = 3, q = 3, itmax = 25,
nkmeans = 5, nrandom = 5, tol = 1.e-5)
plot(mcfa_fit)
# \donttest{
mctfa_fit <- mcfa(iris[, -5], g = 3, q = 3, itmax = 500,
nkmeans = 5, nrandom = 5, tol = 1.e-5, df_update = TRUE)
# }
Run the code above in your browser using DataLab