predict.hdc: Prediction method for ‘hdc’ class objects.

Description

This function computes the class prediction of a dataset with respect to the model-based supervised and unsupervised classification methods hdda and hddc.

Usage

# S3 method for hdc
predict(object, data, cls = NULL, ...)

Value

class: vector of the predicted class.
prob: The matrix of the probabilities to belong to a class for each observation and each class.
loglik: The likelihood of the classification on the new data.

If the initial class vector is given to the argument ‘cls’ then the adjusted rand index (ARI) is also returned. Also the following object is returned:

ARI: The confusion matrix of the classification.

Arguments

object: An ‘hdc’ class object obtained by using hdda or hddc function.
data: A matrix or a data frame of observations, assuming the rows are the observations and the columns the variables. The data should be in the exact same format as the one that trained the model. Note that NAs are not allowed.
cls: A vector of the thue classes of each observation. It is optional and used to be compared to the predicted classes, default is NULL.
...: Not currently used.

Author

Laurent Berge, Charles Bouveyron and Stephane Girard

References

Bouveyron, C. Girard, S. and Schmid, C. (2007) “High Dimensional Discriminant Analysis”, Communications in Statistics: Theory and Methods, vol. 36 (14), pp. 2607--2623

Bouveyron, C. Girard, S. and Schmid, C. (2007) “High-Dimensional Data Clustering”, Computational Statistics and Data Analysis, vol. 52 (1), pp. 502--519

Berge, L. Bouveyron, C. and Girard, S. (2012) “HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data”, Journal of Statistical Software, 46(6), 1--29, url: http://www.jstatsoft.org/v46/i06/

Examples

Run this code

# Example 1:
data <- simuldata(1000, 1000, 50)
X <- data$X
clx <- data$clx
Y <- data$Y
cly <- data$cly

#clustering of the gaussian dataset:
prms1 <- hddc(X, K=3, algo="CEM", init='param')      
          
#class vector obtained by the clustering:
prms1$class                   

# only to see the good classification rate and 
# the Adjusted Rand Index:                     
res1 <- predict(prms1, X, clx)                                            
res2 <- predict(prms1, Y)       

#the class predicted using hddc parameters on the test dataset:  
res2$class                                                           


# Example 2:
data(Crabs)
#clustering of the Crabs dataset:
prms3 <- hddc(Crabs[,-1], K=4, algo="EM", init='kmeans')        
res3 <- predict(prms3, Crabs[,-1], Crabs[,1])

Run the code above in your browser using DataLab