tune_K: Select the number of clusters `K` in DEEM

Description

Select the number of clusters K along with tuning parameter lambda through BIC in DEEM.

Usage

tune_K(X, seqK, seqlamb, initial = TRUE, vec_x = NULL)

Arguments

Input tensor (or matrix) list of length $n$, where $n$ is the number of observations. Each element of the list is a tensor or matrix. The order of tensor can be any positive integer not less than 2.

seqK

A sequence of user-specified number of clusters.

seqlamb

A sequence of user-specified lambda values. lambda is the weight of L1 penalty and a smaller lambda allows more variables to be nonzero

initial

Whether to initialize algorithm with K-means clustering. Default value is TRUE.

vec_x

Vectorized tensor data. Default value is NULL

Value

opt_K

Selected number of clusters that leads to optimal BIC.

opt_lamb

Tuned lambda that leads to optimal BIC.

Krank

A selection summary.

Details

The tune_K function runs tune_lamb function length(seqK) times to choose the tuning parameter $\lambda$ and number of clusters $K$ simultaneously. Let $\widehat{\bm{\theta}}^{\{\lambda,K\}}$ be the output of DEEM with the tuning parameter and number of clusters fixed at $\lambda$ and $K$ respectively, tune_K looks for the values of $\lambda$ and $K$ that minimizes $$\mathrm{BIC}(\lambda,K)=-2\sum_{i=1}^n\log(\sum_{k=1}^K\widehat{\pi}^{\{\lambda,K\}}_kf_k(\mathbf{X}_i;\widehat{\bm{\theta}}_k^{\{\lambda,K\}}))+\log(n)\cdot |\widehat{\mathcal{D}}^{\{\lambda,K\}}|,$$ where $\widehat{\mathcal{D}}^{\{\lambda,K\}}=\{(k, {\mathcal{J}}): \widehat b_{k,{\mathcal{J}}}^{\lambda} \neq 0 \}$ is the set of nonzero elements in $\widehat{\bm{B}}_2^{\{\lambda,K\}},\ldots,\widehat{\bm{B}}_K^{\{\lambda,K\}}$. The tune_K function intrinsically selects the initial point and return the optimal estimated labels.

References

Mai, Q., Zhang, X., Pan, Y. and Deng, K. (2021). A Doubly-Enhanced EM Algorithm for Model-Based Tensor Clustering. Journal of the American Statistical Association.

Examples

Run this code

# NOT RUN {
dimen = c(5,5,5)
nvars = prod(dimen)
K = 2
n = 100
sigma = array(list(),3)

sigma[[1]] = sigma[[2]] = sigma[[3]] = diag(5)

B2=array(0,dim=dimen)
B2[1:3,1,1]=2

y = c(rep(1,50),rep(2,50))
M = array(list(),K)
M[[1]] = array(0,dim=dimen)
M[[2]] = B2

vec_x=matrix(rnorm(n*prod(dimen)),ncol=n)
X=array(list(),n)
for (i in 1:n){
  X[[i]] = array(vec_x[,i],dim=dimen)
  X[[i]] = M[[y[i]]] + X[[i]]
}

mytune = tune_K(X, seqK=2:4, seqlamb=seq(0.01,0.1,by=0.01))
# }

Run the code above in your browser using DataLab