Friedman's classification benchmark data
Function to generate 3-class classification benchmarking data as introduced by J.H. Friedman (1989)
friedman.data(setting = 1, p = 6, samplesize = 40, asmatrix = FALSE)
- the problem setting (integer 1,2,...,6).
- number of variables (6, 10, 20 or 40).
- sample size (number of observations, >=6).
TRUE, results are returned as a matrix, otherwise as a data frame (default).
When J.H. Friedman introduced the Regularized Discriminant Analysis
rda) in 1989, he used artificially generated data
to test the procedure and to examine its performance in comparison to
Linear and Quadratic Discriminant Analysis
6 different settings were considered to demonstrate potential strengths
and weaknesses of the new method:
- equal spherical covariance matrices,
- unequal spherical covariance matrices,
- equal, highly ellipsoidal covariance matrices with mean differences in low-variance subspace,
- equal, highly ellipsoidal covariance matrices with mean differences in high-variance subspace,
- unequal, highly ellipsoidal covariance matrices with zero mean differences and
- unequal, highly ellipsoidal covariance matrices with nonzero mean differences.
- Depending on
asmatrixeither a data frame or a matrix with
p+1columns, the first column containing the class labels, the remaining columns being the variables.
- Friedman's classification benchmark data
- Regularized Discriminant Analysis
Friedman, J.H. (1989): Regularized Discriminant Analysis. In: Journal of the American Statistical Association 84, 165-175.
# Reproduce the 1st setting with 6 variables. # Error rate should be somewhat near 9 percent. training <- friedman.data(1, 6, 40) x <- rda(class ~ ., data = training, gamma = 0.74, lambda = 0.77) test <- friedman.data(1, 6, 100) y <- predict(x, test[,-1]) errormatrix(test[,1], y$class)