This function computes the class centroids and covariance matrix for a training set for determining Mahalanobis distances of samples to each class centroid.
classDist(x, ...)# S3 method for default
classDist(x, y, groups = 5, pca = FALSE, keep = NULL, ...)
# S3 method for classDist
predict(object, newdata, trans = log, ...)
for classDist
, an object of class classDist
with
elements:
a list with elements for each class. Each element contains a mean vector for the class centroid and the inverse of the class covariance matrix
a character vector of class labels
the results of prcomp
when
pca = TRUE
the function call
the number of variables
a vector of samples sizes per class
For predict.classDist
, a matrix with columns for each class.
The columns names are the names of the class with the prefix
dist.
. In the case of numeric y
, the class labels are
the percentiles. For example, of groups = 9
, the variable names
would be dist.11.11
, dist.22.22
, etc.
a matrix or data frame of predictor variables
optional arguments to pass (not currently used)
a numeric or factor vector of class labels
an integer for the number of bins for splitting a numeric outcome
a logical: should principal components analysis be applied to the dataset prior to splitting the data by class?
an integer for the number of PCA components that should by used to predict new samples (NULL
uses all within a tolerance of sqrt(.Machine$double.eps)
)
an object of class classDist
a matrix or data frame. If vars
was previously specified, these columns should be in newdata
an optional function that can be applied to each class distance. trans = NULL
will not apply a function
Max Kuhn
For factor outcomes, the data are split into groups for each class
and the mean and covariance matrix are calculated. These are then
used to compute Mahalanobis distances to the class centers (using
predict.classDist
The function will check for non-singular matrices.
For numeric outcomes, the data are split into roughly equal sized
bins based on groups
. Percentiles are used to split the data.
Forina et al. CAIMAN brothers: A family of powerful classification and class modeling techniques. Chemometrics and Intelligent Laboratory Systems (2009) vol. 96 (2) pp. 239-245
trainSet <- sample(1:150, 100)
distData <- classDist(iris[trainSet, 1:4],
iris$Species[trainSet])
newDist <- predict(distData,
iris[-trainSet, 1:4])
splom(newDist, groups = iris$Species[-trainSet])
Run the code above in your browser using DataLab