Fit the Tensor Envelope Mixture Model (TEMM)
TEMM(Xn, u, K, initial = "kmeans", iter.max = 500,
stop = 1e-3, trueY = NULL, print = FALSE)
The tensor for clustering, should be array type, the last dimension is the sample size n
.
A vector of envelope dimension
Number of clusters, greater than or equal to 2
.
Initialization meth0d for the regularized EM algorithm. Default value is "kmeans".
Maximum number of iterations. Default value is 500
.
Convergence threshold of relative change in cluster means. Default value is 1e-3
.
A vector of true cluster labels of each observation. Default value is NULL.
Whether to print information including current iteration number, relative change in cluster means
and clustering error (%
) in each iteration.
A vector of estimated labels.
A vector of estimated prior probabilities for clusters.
A n
by K
matrix of estimated membership weights.
A list of estimated cluster means.
A list of estimated covariance matrices.
Estimation of Mm
defined in paper.
Estimation of Nm
defined in paper.
A list of estimated envelope basis.
A list of envelope projection matrices.
The TEMM
function fits the Tensor Envelope Mixture Model (TEMM) through a subspace-regularized EM algorithm. For mode \(m\), let \((\bm{\Gamma}_m,\bm{\Gamma}_{0m})\in R^{p_m\times p_m}\) be an orthogonal matrix where \(\bm{\Gamma}_{m}\in R^{p_{m}\times u_{m}}\), \(u_{m}\leq p_{m}\), represents the material part. Specifically, the material part \(\mathbf{X}_{\star,m}=\mathbf{X}\times_{m}\bm{\Gamma}_{m}^{T}\) follows a tensor normal mixture distribution, while the immaterial part \(\mathbf{X}_{\circ,m}=\mathbf{X}\times_{m}\bm{\Gamma}_{0m}^{T}\) is unimodal, independent of the material part and hence can be eliminated without loss of clustering information. Dimension reduction is achieved by focusing on the material part \(\mathbf{X}_{\star,m}=\mathbf{X}\times_{m}\bm{\Gamma}_{m}^{T}\). Collectively, the joint reduction from each mode is
$$
\mathbf{X}_{\star}=[\![\mathbf{X};\bm{\Gamma}_{1}^{T},\dots,\bm{\Gamma}_{M}^{T}]\!]\sim\sum_{k=1}^{K}\pi_{k}\mathrm{TN}(\bm{\alpha}_{k};\bm{\Omega}_{1},\dots,\bm{\Omega}_{M}),\quad \mathbf{X}_{\star}\perp\!\!\!\perp\mathbf{X}_{\circ,m},
$$
where \(\bm{\alpha}_{k}\in R^{u_{1}\times\cdots\times u_{M}}\) and \(\bm{\Omega}_m\in R^{u_m\times u_m}\) are the dimension-reduced clustering parameters and \(\mathbf{X}_{\circ,m}\) does not vary with cluster index \(Y\). In the E-step, the membership weights are evaluated as
$$
\widehat{\eta}_{ik}^{(s)}=\frac{\widehat{\pi}_{k}^{(s-1)}f_{k}(\mathbf{X}_i;\widehat{\bm{\theta}}^{(s-1)})}{\sum_{k=1}^{K}\widehat{\pi}_{k}^{(s-1)}f_{k}(\mathbf{X}_i;\widehat{\bm{\theta}}^{(s-1)})},
$$
where \(f_k\) denotes the conditional probability density function of \(\mathbf{X}_i\) within the \(k\)-th cluster. In the subspace-regularized M-step, the envelope subspace is iteratively estimated through a Grassmann manifold optimization that minimize the following log-likelihood-based objective function:
$$
G_m^{(s)}(\bm{\Gamma}_m) = \log|\bm{\Gamma}_m^T \mathbf{M}_m^{(s)} \bm{\Gamma}_m|+\log|\bm{\Gamma}_m^T (\mathbf{N}_m^{(s)})^{-1} \bm{\Gamma}_m|,
$$
where \(\mathbf{M}_{m}^{(s)}\) and \(\mathbf{N}_{m}^{(s)}\) are given by
$$
\mathbf{M}_m^{(s)} = \frac{1}{np_{-m}}\sum_{i=1}^{n} \sum_{k=1}^{K}\widehat{\eta}_{ik}^{(s)} (\bm{\epsilon}_{ik}^{(s)})_{(m)}(\widehat{\bm{\Sigma}}_{-m}^{(s-1)})^{-1} (\bm{\epsilon}_{ik}^{(s)})_{(m)}^T,
$$
$$
\mathbf{N}_m^{(s)} = \frac{1}{np_{-m}}\sum_{i=1}^{n} (\mathbf{X}_i)_{(m)}(\widehat{\bm{\Sigma}}_{-m}^{(s-1)})^{-1}(\mathbf{X}_i)_{(m)}^T.
$$
The intermediate estimators \(\mathbf{M}_{m}^{(s)}\) can be viewed the mode-\(m\) conditional variation estimate of \(\mathbf{X}\mid Y\) and \(\mathbf{N}_{m}^{(s)}\) is the mode-\(m\) marginal variation estimate of \(\mathbf{X}\).
Deng, K. and Zhang, X. (2021). Tensor Envelope Mixture Model for Simultaneous Clustering and Multiway Dimension Reduction. Biometrics.
# NOT RUN {
A = array(c(rep(1,20),rep(2,20))+rnorm(40),dim=c(2,2,10))
myfit = TEMM(A,u=c(2,2),K=2)
# }
Run the code above in your browser using DataLab