Learn R Programming

mixture (version 2.2.0)

stpcm: Skew-t Parsimonious Clustering Models

Description

Carries out model-based clustering or classification using some or all of the 14 parsimonious Skew-t clustering models (STPCM).

Usage

stpcm(data=NULL, G=1:3, mnames=NULL,
		start=2, label=NULL, 
		veo=FALSE, da=c(1.0),
		nmax=1000, atol=1e-8, mtol=1e-8, mmax=10, burn=5,
		pprogress=FALSE, pwarning=TRUE, 
		stochastic = FALSE, latent_method="standard", seed=123)

Arguments

Value

An object of class vgpcm is a list with components:

map

A vector of integers indicating the maximum a posteriori classifications for the best model.

model_objs

A list of all estimated models with parameters returned from the C++ call.

best_model

A class of vgpcm_best containing; the number of groups for the best model, the covariance structure, and Bayesian Information Criterion (BIC) value.

loglik

The log-likelihood values from fitting the best model.

z

A matrix giving the raw values upon which map is based.

BIC

A G by mnames by 3 dimensional array with values pertaining to BIC calculations. (legacy)

gpar

A list object for each cluster pertaining to parameters. (legacy)

startobject

The type of object inputted into start.

row_tags

If there were NAs in the original dataset, a vector of indices referencing the row of the imputed vectors is given.

Best Model

An object of class stpcm_best is a list with components:

model_type

A string containg summarized information about the type of model estimated (Covariance structure and number of groups).

model_obj

An internal list containing all parameters returned from the C++ call.

BIC

Bayesian Index Criterion (positive scale, bigger is better).

loglik

Log liklihood from the estimated model.

nparam

Number of a parameters in the mode.

startobject

The type of object inputted into start.

G

An integer representing the number of groups.

cov_type

A string representing the type of covariance matrix (see 14 models).

status

Convergence status of EM algorithm according to Aitken's Acceleration

map

A vector of integers indicating the maximum a posteriori classifications for the best model.

row_tags

If there were NAs in the original dataset, a vector of indices referencing the row of the imputed vectors is given.

Internal Objects

All classes contain an internal list called model_obj or model_objs with the following components:

zigs

a posteori matrix

G

An integer representing the number of groups.

sigs

A vector of covariance matrices for each group

mus

A vector of location vectors for each group

alphas

A vector containg skewness vectors for each group

gammas

A vector containing estimated gamma parameters for each group

Details

The data x are either clustered or classified using Skew-t mixture models with some or all of the 14 parsimonious covariance structures described in Celeux & Govaert (1995). The algorithms given by Celeux & Govaert (1995) is used for 12 of the 14 models; the "EVE" and "VVE" models use the algorithms given in Browne & McNicholas (2014). Starting values are very important to the successful operation of these algorithms and so care must be taken in the interpretation of results.

References

McNicholas, P.D. (2016), Mixture Model-Based Classification. Boca Raton: Chapman & Hall/CRC Press

Browne, R.P. and McNicholas, P.D. (2014). Estimating common principal components in high dimensions. Advances in Data Analysis and Classification 8(2), 217-226.

Wei, Y., Tang, Y. and McNicholas, P.D. (2019), 'Mixtures of generalized hyperbolic distributions and mixtures of skew-t distributions for model-based clustering with incomplete data', Computational Statistics and Data Analysis 130, 18-41.

Celeux, G., Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition 28(5), 781-793.

Examples

Run this code
data("sx3")

if (FALSE) {

### estimate "VVV" "EVE"
ax = stpcm(sx3, G=1:3, mnames=c("VVV","EVE"), start=0)
summary(ax)
ax


### estimate all 14 covariance structures 
ax = stpcm(sx3, G=1:3, mnames=NULL, start=0)
summary(ax)
ax

### model based classification
sx3.label = c(rep(1,1000),rep(2,1000))
plot(sx3, col=sx3.label)
axl = stpcm(sx3, G=2, mnames=c("VVV", "EVE"), label=sx3.label)
summary(axl)

}

Run the code above in your browser using DataLab