Functions for fitting mixtures of factor analyzers (MFA) and mixtures of t-factor analyzers (MtFA) to data. Maximum Likelihood estimates of the model parameters are obtained using the Alternating Expectation Conditional Maximization (AECM) algorithm.
In the case of MFA, component distributions belong to the family of multivariate normal distributions, while with M\(t\)FA the component distributions correspond to multivariate t distributions.
mfa(Y, g, q, itmax = 500, nkmeans = 20, nrandom = 20,
tol = 1.e-5, sigma_type = 'common', D_type = 'common', init_clust = NULL,
init_para = NULL, conv_measure = 'diff', warn_messages = TRUE, ...)
mtfa(Y, g, q, itmax = 500, nkmeans = 20, nrandom = 20,
tol = 1.e-5, df_init = rep(30, g), df_update = TRUE,
sigma_type = 'common', D_type = 'common', init_clust = NULL,
init_para = NULL, conv_measure = 'diff', warn_messages = TRUE, ...)
Object of class c("emmix", "mfa")
or c("emmix",
"mtfa")
containing the fitted model parameters is returned.
Details of the components are as fellows:
Number of mixture components.
Number of factors.
Mixing proportions of the components.
Matrix containing estimates of component means (in columns) of mixture component. Size \(p \times g\).
Array containing component dependent loading matrices. Size \(p \times q \times g\).
Estimates of error covariance matrices. If D_type = "common"
was used then D
is \(p \times p\) matrix common to
all components, if D_type = "unique"
, then D
is a
size \(p \times p \times g\) array.
Degrees of freedom for each component.
Log-likelihood at the convergence.
Bayesian information criterion.
Matrix of posterior probabilities for the data
used based on the fitted values. Matrix of size n by g
.
Vector of integers 1 to g indicating cluster allocations of the observations.
Estimated conditional expected component scores of the unobservable factors given the data and the component membership. Size is Size \(n \times q \times g\).
Means of the estimated conditional expected factors scores over estimated posterior distributions. Size \(n \times q\).
Alternative estimate of Umean
where the posterior probabilities
for each sample are replaced by component indicator vectors
which contain one in the element corresponding to the highest posterior
probability while others zero. Size \(n \times q\).
Description of messages, if any.
Whether common or unique error covariance is used, as specified in model fitting.
Whether the degree of freedom parameter
(v
) was fixed or estimated (only for mtfa
).
A matrix or a data frame of which rows correspond to observations and columns to variables.
Number of components.
Number of factors.
Maximum number of EM iterations.
The number of times the k-means algorithm to be used in partition
the data into g
groups. These groupings are then used in
initializing the parameters for the EM algorithm.
The number of random g
-group partitions for the data to be used
initializing the EM algorithm.
The EM algorithm terminates if the measure of convergence falls below this value.
To specify whether the covariance matrices (for mfa
)
or the scale matrices (for mtfa
) of the components
are constrained
to be the same (default, sigma_type = "common"
)
or not (sigma_type = "unique"
).
To specify whether the diagonal error covariance matrix is common to all
the components or not. If sigma_type = "unique"
, then
D_type
could either be "common"
(the default) to each component, or "unique"
.
If the sigma_type = "common"
, then
D_type
must also be "common"
.
A vector or matrix consisting of partition of samples to be used in the EM algorithm. For matrix of partitions, columns must corresponds individual partitions of the data. Optional.
A list containing model parameters to be used as initial parameter estimates for the EM algorithm. Optional.
The default 'diff'
stops the EM iterations if
|\(l^{(k+1)}\) - \(l^{(k)}\)| < tol
where
\(l^{(k)}\) is the log-likelihood at the \(k\)th EM iteration.
If 'ratio'
, then the convergence of the EM steps is measured
using the |(\(l^{(k+1)}\) - \(l^{(k)}\))/\(l^{(k+1)}\)|.
Initial values of the degree of freedom parameters for mtfa
.
If df_update = TRUE
(default), then the degree of freedom parameters
values will be updated during the EM iterations.
Otherwise, if df_update = FALSE
, they will be fixed at the initial
values specified in df_init
.
With warn_messages = TRUE
(default), the output would
include some description of the reasons where, if any, the model fitting
function failed to provide a fit for a given set of initial
parameter values.
Not used.
Suren Rathnayake, Geoffrey McLachlan
Cluster a given data set using mixtures of factor analyzers or approach or using mixtures of t-factor analyzers.
Ghahramani Z, and Hinton GE (1997). The EM algorithm for mixture of factor analyzers. Technical Report, CRG-TR-96-1, University of Toronto, Toronto.
McLachlan GJ, Bean RW, Ben-Tovim Jones L (2007). Extension of the mixture of factor analyzers model to incorporate the multivariate t distribution. Computational Statistics & Data Analysis, 51, 5327--5338.
McLachlan GJ, Baek J, and Rathnayake SI (2011). Mixtures of factor analyzers for the analysis of high-dimensional data. In Mixture Estimation and Applications, KL Mengersen, CP Robert, and DM Titterington (Eds). Hoboken, New Jersey: Wiley, pp. 171--191.
McLachlan GJ, Peel D, and Bean RW (2003). Modelling high-dimensional data by mixtures of factor analyzers. Computational Statistics & Data Analysis 41, 379--388.
mcfa
model <- mfa(iris[, -5], g=3, q=2, itmax=200, nkmeans=1, nrandom=5)
summary(model)
# \donttest{
model <- mtfa(iris[, -5], g=3, q=2, itmax=200, nkmeans=1, nrandom=5)
# }
Run the code above in your browser using DataLab