classDist
From caret v4.39
by Max Kuhn
Compute and predict the distances to class centroids
This function computes the class centroids and covariance matrix for a training set for determining Mahalanobis distances of samples to each class centroid.
- Keywords
- manip
Usage
classDist(x, ...)## S3 method for class 'default':
classDist(x, y, groups = 5, pca = FALSE, keep = NULL, ...)
## S3 method for class 'classDist':
predict(object, newdata, trans = log, ...)
Arguments
- x
- a matrix or data frame of predictor variables
- y
- a numeric or factor vector of class labels
- groups
- an integer for the number of bins for splitting a numeric outcome
- pca
- a logical: should principal components analysis be applied to the dataset prior to splitting the data by class?
- keep
- an integer for the number of PCA components that should
by used to predict new samples (
NULL
uses all within a tolerance ofsqrt(.Machine$double.eps)
) - object
- an object of class
classDist
- newdata
- a matrix or data frame. If
vars
was previously specified, these columns should be innewdata
- trans
- an optional function that can be applied to each class
distance.
trans = NULL
will not apply a function - ...
- optional arguments to pass (not currently used)
Details
For factor outcomes, the data are split into groups for each class
and the mean and covariance matrix are calculated. These are then
used to compute Mahalanobis distances to the class centers (using
predict.classDist
The function will check for non-singular matrices.
For numeric outcomes, the data are split into roughly equal sized
bins based on groups
. Percentiles are used to split the data.
Value
- for
classDist
, an object of classclassDist
with elements: values a list with elements for each class. Each element contains a mean vector for the class centroid and the inverse of the class covariance matrix classes a character vector of class labels pca the results of prcomp
whenpca = TRUE
call the function call p the number of variables n a vector of samples sizes per class - For
predict.classDist
, a matrix with columns for each class. The columns names are the names of the class with the prefixdist.
. In the case of numericy
, the class labels are the percentiles. For example, ofgroups = 9
, the variable names would bedist.11.11
,dist.22.22
, etc.
See Also
Examples
trainSet <- sample(1:150, 100)
distData <- classDist(iris[trainSet, 1:4],
iris$Species[trainSet])
newDist <- predict(distData,
iris[-trainSet, 1:4])
splom(newDist, groups = iris$Species[-trainSet])
Community examples
Looks like there are no examples yet.