DEEM: Doubly-enhanced EM algorithm

Description

Doubly-enhanced EM algorithm for tensor clustering

Usage

DEEM(X, nclass, niter = 100, lambda = NULL, dfmax = n, pmax = nvars, pf = rep(1, nvars),
eps = 1e-04, maxit = 1e+05, sml = 1e-06, verbose = FALSE, ceps = 0.1,
initial = TRUE, vec_x = NULL)

Arguments

Input tensor (or matrix) list of length $n$, where $n$ is the number of observations. Each element of the list is a tensor or matrix. The order of tensor can be any positive integer not less than 2.

nclass

Number of clusters.

niter

Maximum iteration times for EM algorithm. Default value is 100.

lambda

A user-specified lambda value. lambda is the weight of L1 penalty and a smaller lambda allows more variables to be nonzero

dfmax

The maximum number of selected variables in the model. Default is the number of observations n.

pmax

The maximum number of potential selected variables during iteration. In middle step, the algorithm can select at most pmax variables and then shrink part of them such that the number of final selected variables is less than dfmax.

Weight of lasso penalty. Default is a vector of value 1 and length p, representing L1 penalty of length $p$. Can be modified to use adaptive lasso penalty.

eps

Convergence threshold for coordinate descent difference between iterations. Default value is 1e-04.

maxit

Maximum iteration times for coordinate descent for all lambda. Default value is 1e+05.

sml

Threshold for ratio of loss function change after each iteration to old loss function value. Default value is 1e-06.

verbose

Indicates whether print out lambda during iteration or not. Default value is FALSE.

ceps

Convergence threshold for cluster mean difference between iterations. Default value is 1.

initial

Whether to initialize algorithm with K-means clustering. Default value is TRUE.

vec_x

Vectorized tensor data. Default value is NULL

Value

A vector of estimated prior probabilities for clusters.

A list of estimated cluster means.

sigma

A list of estimated covariance matrices.

gamma

A n by nclass matrix of estimated membership weights.

A vector of estimated labels.

iter

Number of iterations until convergence.

Average zero elements in beta over iterations.

beta

A matrix of vectorized B_k.

Details

The DEEM function implements the Doubly-Enhanced EM algorithm (DEEM) for tensor clustering. The observations $\mathbf{X}_i$ are assumed to be following the tensor normal mixture model (TNMM) with common covariances across different clusters: $$ \mathbf{X}_i\sim\sum_{k=1}^K\pi_k \mathrm{TN}(\bm{\mu}_k;\bm{\Sigma}_1,\ldots,\bm{\Sigma}_M),\quad i=1,\dots,n, $$ where $0<\pi_k<1$ is the prior probability for $\mathbf{X}$ to be in the $k$-th cluster such that $\sum_{k=1}^{K}\pi_k=1$, $\bm{\mu}_k$ is the cluster mean of the $k$-th cluster and $\bm{\Sigma}_1,\ldots,\bm{\Sigma}_M)$ are the common covariances across different clusters. Under the TNMM framework, the optimal clustering rule can be showed as $$ \widehat{Y}^{opt}=\arg\max_k\{\log\pi_k+\langle\mathbf{X}-(\bm{\mu}_1+\bm{\mu}_k)/2,\mathbf{B}_k\rangle\}, $$ where $\mathbf{B}_k=[\![\bm{\mu}_k-\bm{\mu}_1;\bm{\Sigma}_1^{-1},\ldots,\bm{\Sigma}_M^{-1}]\!]$. In the enhanced E-step, DEEM imposes sparsity directly on the optimal clustering rule as a flexible alternative to popular low-rank assumptions on tensor coefficients $\mathbf{B}_k$ as $$ \min_{\mathbf{B}_2,\dots,\mathbf{B}_K}\bigg[\sum_{k=2}^K(\langle\mathbf{B}_k,[\![\mathbf{B}_k,\widehat{\bm{\Sigma}}_1^{(t)},\ldots,\widehat{\bm{\Sigma}}_M^{(t)}]\!]\rangle-2\langle\mathbf{B}_k,\widehat{\bm{\mu}}_k^{(t)}-\widehat{\bm{\mu}}_1^{(t)}\rangle) +\lambda^{(t+1)}\sum_{\mathcal{J}}\sqrt{\sum_{k=2}^Kb_{k,\mathcal{J}}^2}\bigg], $$ where $\lambda^{(t+1)}$ is a tuning parameter. In the enhanced M-step, DEEM employs a new estimator for the tensor correlation structure, which facilitates both the computation and the theoretical studies.

References

Mai, Q., Zhang, X., Pan, Y. and Deng, K. (2021). A Doubly-Enhanced EM Algorithm for Model-Based Tensor Clustering. Journal of the American Statistical Association.

Examples

Run this code

# NOT RUN {
dimen = c(5,5,5)
nvars = prod(dimen)
K = 2
n = 100
sigma = array(list(),3)

sigma[[1]] = sigma[[2]] = sigma[[3]] = diag(5)

B2=array(0,dim=dimen)
B2[1:3,1,1]=2

y = c(rep(1,50),rep(2,50))
M = array(list(),K)
M[[1]] = array(0,dim=dimen)
M[[2]] = B2

vec_x=matrix(rnorm(n*prod(dimen)),ncol=n)
X=array(list(),n)
for (i in 1:n){
  X[[i]] = array(vec_x[,i],dim=dimen)
  X[[i]] = M[[y[i]]] + X[[i]]
}

myfit = DEEM(X, nclass=2, lambda=0.05)
# }

Run the code above in your browser using DataLab