Learn R Programming

ppgmmga (version 1.3)

ppgmmga: Projection pursuit based on Gaussian mixtures and evolutionary algorithms for data visualisation

Description

A Projection Pursuit (PP) method for dimension reduction seeking "interesting" data structures in low-dimensional projections. A negentropy index is computed from the density estimated using Gaussian Mixture Models (GMMs). Then, the PP index is maximised by Genetic Algorithms (GAs) to find the optimal projection basis.

Usage

ppgmmga(data, 
        d, 
        approx = c("UT", "VAR", "SOTE", "none"), 
        center = TRUE, 
        scale = TRUE, 
        GMM = NULL, 
        gatype = c("ga", "gaisl"), 
        options = ppgmmga.options(),
        seed = NULL, 
        verbose = interactive(), ...)

Value

Returns an object of class 'ppgmmga'. See ppgmmga-class for a description of the object.

Arguments

data

A \(n x p\) matrix containing the data with rows corresponding to observations and columns corresponding to variables.

d

An integer specifying the dimension of the subspace onto which the data are projected and visualised.

approx

A string specifying the type of computation to perform to obtain the negentropy for GMMs. Possible values are:

"UT"= Unscented Trasformation approximation (default);
"VAR"= VARiational approximation;
"SOTE"= Second Order Taylor Expansion approximation;
"none"= exact calculation (no approximation, experimental).

center

A logical value indicating whether or not the data are centred. By default is set to TRUE.

scale

A logical value indicating whether or not the data are scaled. By default is set to TRUE.

GMM

An object of class 'densityMclust' specifying a Gaussian mixture density estimate as returned by densityMclust.

gatype

A string specifying the type of genetic algoritm to be used to maximised the negentropy. Possible values are:

"ga"= simple genetic algorithm (ga);
"gaisl"= island genetic algorithm (gaisl).

options

A list of options containing all the important arguments to pass to densityMclust function of the mclust package, and to ga function of the GA package. See ppgmmga.options for the available options. Note that by setting the options argument does not change the global options provided by ppgmmga.options, but only the options for a single call to ppgmmga.

seed

An integer value with the random number generator state. It may be used to replicate the results of ppgmmga algorithm.

verbose

A logical value controlling if the evolution of GA search is shown. By default is TRUE reporting the number of iteration, average and best fitness value.

...

Further arguments passed to or from other methods.

Author

Serafini A. srf.alessio@gmail.com
Scrucca L. luca.scrucca@unipg.it

Details

Projection pursuit (PP) is a features extraction method for analysing high-dimensional data with low-dimension projections by maximising a projection index to find out the best orthogonal projections. A general PP procedure can be summarised in few steps: the data may be transformed, the PP index is chosen and the subspace dimension is fixed. Then, the PP index is optimised.

For clusters visualisation the negentropy index is considerd. Since such index requires an estimation of the underling data density, Gaussian mixture models (GMMs) are used to approximate such density. Genetic Algorithms are then employed to maximise the negentropy with respect to the basis of the projection subspace.

References

Scrucca, L. and Serafini, A. (2019) Projection pursuit based on Gaussian mixtures and evolutionary algorithms. Journal of Computational and Graphical Statistics, 28:4, 847–860. DOI: 10.1080/10618600.2019.1598871

See Also

summary.ppgmmga, plot.ppgmmga, ppgmmga-class

Examples

Run this code
if (FALSE) {
data(iris)
X <- iris[,-5]
Class <- iris$Species

# 1-dimensional PPGMMGA

PP1D <- ppgmmga(data = X, d = 1)
summary(PP1D)
plot(PP1D, bins = 11)
plot(PP1D, bins = 11, Class)

# 2-dimensional PPGMMGA

PP2D <- ppgmmga(data = X, d = 2)
summary(PP2D)
plot(PP2D)
plot(PP2D, Class)

## Unscented Transformation approximation

PP2D_1 <- ppgmmga(data = X, d = 2, approx = "UT")
summary(PP2D_1)
plot(PP2D_1, Class)

## VARiational approximation

PP2D_2 <- ppgmmga(data = X, d = 2, approx = "VAR")
summary(PP2D_2)
plot(PP2D_2, Class)

## Second Order Taylor Expansion approximation

PP2D_3 <- ppgmmga(data = X, d = 2, approx = "SOTE")
summary(PP2D_3)
plot(PP2D_3, Class)

# 3-dimensional PPGMMGA

PP3D <- ppgmmga(data = X, d = 3,)
summary(PP3D)
plot(PP3D, Class)

# A rotating 3D plot can be obtained using:
# if(!require("msir")) install.packages("msir")
# msir::spinplot(PP3D$Z, markby = Class,
#                col.points = ppgmmga.options("classPlotColors")[1:3])
}

Run the code above in your browser using DataLab