classDist: Compute and predict the distances to class centroids

Description

This function computes the class centroids and covariance matrix for a training set for determining Mahalanobis distances of samples to each class centroid.

Usage

classDist(x, ...)
## S3 method for class 'default':
classDist(x, y, groups = 5, pca = FALSE, keep = NULL, ...)
## S3 method for class 'classDist':
predict(object, newdata, trans = log, ...)

Arguments

Value

for classDist, an object of class classDist with elements:
valuesa list with elements for each class. Each element contains a mean vector for the class centroid and the inverse of the class covariance matrix
classesa character vector of class labels
pcathe results of prcomp when pca = TRUE
callthe function call
pthe number of variables
na vector of samples sizes per class
For predict.classDist, a matrix with columns for each class. The columns names are the names of the class with the prefix dist.. In the case of numeric y, the class labels are the percentiles. For example, of groups = 9, the variable names would be dist.11.11, dist.22.22, etc.

Details

For factor outcomes, the data are split into groups for each class and the mean and covariance matrix are calculated. These are then used to compute Mahalanobis distances to the class centers (using predict.classDist The function will check for non-singular matrices.

For numeric outcomes, the data are split into roughly equal sized bins based on groups. Percentiles are used to split the data.

References

Forina et al. CAIMAN brothers: A family of powerful classification and class modeling techniques. Chemometrics and Intelligent Laboratory Systems (2009) vol. 96 (2) pp. 239-245

Examples

Run this code

trainSet <- sample(1:150, 100)

distData <- classDist(iris[trainSet, 1:4], 
                      iris$Species[trainSet])

newDist <- predict(distData,
                   iris[-trainSet, 1:4])

splom(newDist, groups = iris$Species[-trainSet])

Run the code above in your browser using DataLab