maccest: Estimation of Multiple Classification Accuracy

Description

Estimation of classification accuracy by multiple classifiers with resampling procedure and comparisons of multiple classifiers.

Usage

maccest(dat, ...)
# S3 method for default
maccest(dat, cl, method="svm", pars = valipars(), 
        tr.idx = NULL, comp="anova",...) 
# S3 method for formula
maccest(formula, data = NULL, ..., subset, na.action = na.omit)

Value

An object of class maccest, including the components:

method: Classification method used.
acc: Accuracy rate.
acc.iter: Accuracy rate of each iteration.
acc.std: Standard derivation of accuracy rate.
mar: Prediction margin.
mar.iter: Prediction margin of each iteration.
auc: The area under receiver operating curve (AUC).
auc.iter: AUC of each iteration.
comp: Multiple comparison method used.
h.test: Hypothesis test results of multiple comparison.
gl.pval: Global or overall p-value.
mc.pval: Pairwise comparison p-values.
sampling: Sampling scheme used.
niter: Number of iteration.
nreps: Number of replications in each iteration.
conf.mat: Overall confusion matrix.
acc.boot: A list of bootrap error such as .632 and .632+ if the validation method is bootrap.

Arguments

formula

A formula of the form groups ~ x1 + x2 + ... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.

data

Data frame from which variables specified in formula are preferentially to be taken.

dat

A matrix or data frame containing the explanatory variables if no formula is given as the principal argument.

cl

A factor specifying the class for each observation if no formula principal argument is given.

method

A vector of multiple classification methods to be used. Classifiers, such as randomForest, svm, knn and lda, can be used. For details, see note below.

pars

A list of resampling scheme such as Leave-one-out cross-validation, Cross-validation, Randomised validation (holdout) and Bootstrap, and control parameters for the calculation of accuracy. See valipars for details.

tr.idx

User defined index of training samples. Can be generated by trainind.

comp

Comparison method of multiple classifier. If comp is anova, the multiple comparisons are performed by ANOVA and then the pairwise comparisons are performed by HSDTukey. If comp is fried, the multiple comparisons are performed by Friedman Test and the pairwise comparisons are performed by Wilcoxon Test.

...

Additional parameters to method.

subset

Optional vector, specifying a subset of observations to be used.

na.action

Function which indicates what should happen when the data contains NA's, defaults to na.omit.

Author

Wanchang Lin

Details

The accuracy rates for classification are obtained used techniques such as Random Forest, Support Vector Machine, k-Nearest Neighbour Classification, Linear Discriminant Analysis and Linear Discriminant Analysis based on sampling methods, including Leave-one-out cross-validation, Cross-validation, Randomised validation (holdout) and Bootstrap.

Examples

Run this code

# Iris data
data(iris)
x      <- subset(iris, select = -Species)
y      <- iris$Species

method <- c("randomForest","svm","pcalda","knn")
pars   <- valipars(sampling="boot", niter = 3, nreps=5, strat=TRUE)
res    <- maccest(Species~., data = iris, method=method, pars = pars, 
                  comp="anova")
## or 
res    <- maccest(x, y, method=method, pars=pars, comp="anova") 

res
summary(res)
plot(res)
boxplot(res)
oldpar <- par(mar = c(5,10,4,2) + 0.1)
plot(res$h.test$tukey,las=1)   ## plot the tukey results
par(oldpar)

Run the code above in your browser using DataLab