initEmmix: Initialize Emmix Parameters

Description

Obtains intial parameter set for use in the EM algorithm. Grouping of the data occurs through one of three possible clustering methods: k-means, random start, and hierarchical clustering.

Usage

initEmmix(dat, g, clust, distr, ncov,maxloop=20)
init.mix( dat, g, distr, ncov, nkmeans, nrandom, nhclust,maxloop=20)

Arguments

dat

The dataset, an n by p numeric matrix, where n is number of observations and p the dimension of data.

The number of components of the mixture model

distr

A three letter string indicating the type of distribution to be fit. See Details.

ncov

A small integer indicating the type of covariance structure. See Details.

clust

An initial partition of the data

nkmeans

An integer to specify the number of KMEANS partitions to be used to find the best initial values

nrandom

An integer to specify the number of random partitions to be used to find the best initial values

nhclust

A logical value to specify whether or not to use hierarchical cluster methods. If TRUE, the Complete Linkage method will be used.

maxloop

An integer to specify how many iterations to be tried to find the initial values,the default value is 10.

Value

pro

A vector of mixing proportions, see Details.

A numeric matrix with each column corresponding to the mean, see Details.

sigma

An array of dimension (p,p,g) with first two dimension corresponding covariance matrix of each component, see Details.

dof

A vector of degrees of freedom for each component, see Details.

delta

A p by g matrix with each column corresponding to a skew parameter vector, see Details.

Details

The distribution type, determined by the distr parameter, which may take any one of the following values: "mvn" for a multivariate normal, "mvt" for a multivariate t-distribution, "msn" for a multivariate skew normal distribution and "mst" for a multivariate skew t-distribution.

The covariance matrix type, represented by the ncov parameter, may be any one of the following: ncov=1 for a common variance, ncov=2 for a common diagonal variance, ncov=3 for a general variance, ncov =4 for a diagonal variance, ncov=5 for sigma(h)*I(p)(diagonal covariance with same identical diagonal element values).

The return values include following components: pro, a numeric vector of the mixing proportion of each component; mu, a p by g matrix with each column as its corresponding mean; sigma, a three dimensional p by p by g array with its jth component matrix (p,p,j) as the covariance matrix for jth component of mixture models; dof, a vector of degrees of freedom for each component; delta, a p by g matrix with its columns corresponding to skew parameter vectors.

When the dataset is huge, it becomes time-consuming to use a large maxloop to try every initial partition. The default is 10. During the procedure to find the best inital clustering and intial values, for t-distribution and skew t-distribution, we don't estimate the degrees of freedom dof, instead they are fixed at 4 for each component.

References

McLachlan G.J. and Krishnan T. (2008). The EM Algorithm and Extensions (2nd). New Jersay: Wiley.

McLachlan G.J. and Peel D. (2000). Finite Mixture Models. New York: Wiley.

Examples

Run this code

# NOT RUN {
sigma<-array(0,c(2,2,3))
for(h in 2:3) sigma[,,h]<-diag(2)
sigma[,,1]<-cbind( c(1,0.2),c(0.2,1))
mu  <- cbind(c(4,-4),c(3.5,4),c( 0, 0))
delta <- cbind(c(3,3),c(1,5),c(-3,1))
dof    <- c(3,5,5)
pro   <- c(0.3,0.3,0.4)
n1=300;n2=300;n3=400;
nn<-c(n1,n2,n3)
n=1000
p=2
ng=3
distr="mvn"
ncov=3
#first we generate a data set
set.seed(111) #random seed is set 
dat <- rdemmix(nn,p,ng,distr,mu,sigma,dof,delta)
clust<- rep(1:ng,nn)
initobj1 <- initEmmix(dat,ng,clust,distr, ncov)
initobj2 <- init.mix( dat,ng,distr,ncov,nkmeans=10,nrandom=0,nhclust=FALSE)
# }

Run the code above in your browser using DataLab