accest: Estimate Classification Accuracy By Resampling Method

Description

Estimate classification accuracy rate by resampling method.

Usage

accest(dat, ...)
# S3 method for default
accest(dat, cl, method, pred.func=predict,pars = valipars(), 
       tr.idx = NULL, ...) 
# S3 method for formula
accest(formula, data = NULL, ..., subset, na.action = na.omit)
aam.cl(x,y,method, pars = valipars(),...)
aam.mcl(x,y,method, pars = valipars(),...)

Value

accest returns an object including the components:

method: Classification method used.
acc: Overall accuracy rate.
acc.iter: Average accuracy rate for each iteration.
acc.all: Accuracy rate for each iteration and replication.
auc: Overall area under receiver operating curve (AUC).
auc.iter: Average AUC for each iteration.
auc.all: AUC for each iteration and replication.
mar: Overall prediction margin.
mar.iter: Average prediction margin for each iteration.
mar.all: Prediction margin for each iteration and replication.
err: Overall error rate.
err.iter: Average error rate for each iteration.
err.all: Error rate for each iteration and replication.
sampling: Sampling scheme used.
niter: Number of iteration.
nreps: Number of replications in each iteration if resampling is not loocv.
conf: Overall confusion matrix.
res.all: All results which can be further processed.
acc.boot: A list of bootstrap accuracy such as .632 and .632+ if the resampling method is bootstrap.

aam.cl returns a vector with acc (accuracy),

auc(area under ROC curve) and mar(class margin).

aam.mcl returns a matrix with columns of acc (accuracy),

auc(area under ROC curve) and mar(class margin).

Arguments

formula

A formula of the form groups ~ x1 + x2 + ... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.

data

Data frame from which variables specified in formula are preferentially to be taken.

dat,x

A matrix or data frame containing the explanatory variables if no formula is given as the principal argument.

cl,y

A factor specifying the class for each observation if no formula principal argument is given.

method

Classification method whose accuracy rate is to be estimated, such as randomForest, svm, knn and lda. For details, see note below. Either a function or a character string naming the function to be called.

pred.func

Predict method (default is predict). Either a function or a character string naming the function to be called.

pars

A list of parameters using by the resampling method such as Leave-one-out cross-validation, Cross-validation, Bootstrap and Randomised validation (holdout). See valipars for details.

tr.idx

User defined index of training samples. Can be generated by trainind.

...

Additional parameters to method.

subset

Optional vector, specifying a subset of observations to be used.

na.action

Function which indicates what should happen when the data contains NA's, defaults to na.omit.

Author

Wanchang Lin

Details

The accuracy rates of classification are estimated by techniques such as Random Forest, Support Vector Machine, k-Nearest Neighbour Classification and Linear Discriminant Analysis based on resampling methods, including Leave-one-out cross-validation, Cross-validation, Bootstrap and Randomised validation (holdout).

Examples

Run this code

# Iris data
data(iris)
# Use KNN classifier and bootstrap for resampling
acc <- accest(Species~., data = iris, method = "knn",
              pars = valipars(sampling = "boot",niter = 2, nreps=5))
acc
summary(acc)
acc$acc.boot

# alternatively the traditional interface:
x <- subset(iris, select = -Species)
y <- iris$Species

## -----------------------------------------------------------------------
# Random Forest with 5-fold stratified cv 
pars   <- valipars(sampling = "cv",niter = 4, nreps=5, strat=TRUE)
tr.idx <- trainind(y,pars=pars)
acc1   <- accest(x, y, method = "randomForest", pars = pars, tr.idx=tr.idx)
acc1
summary(acc1)
# plot the accuracy in each iteration
plot(acc1)

## -----------------------------------------------------------------------
# Forensic Glass data in chap.12 of MASS
data(fgl, package = "MASS")    # in MASS package
# Randomised validation (holdout) of SVM for fgl data
acc2 <- accest(type~., data = fgl, method = "svm", cost = 100, gamma = 1, 
              pars = valipars(sampling = "rand",niter = 10, nreps=4,div = 2/3) )
              
acc2
summary(acc2)
# plot the accuracy in each iteration
plot(acc2)

## -----------------------------------------------------------------------
## Examples of amm.cl and aam.mcl
aam.1 <- aam.cl(x,y,method="svm",pars=pars)
aam.2 <- aam.mcl(x,y,method=c("svm","randomForest"),pars=pars)

## If use two classes, AUC will be calculated
idx <- (y == "setosa")
aam.3 <- aam.cl(x[!idx,],factor(y[!idx]),method="svm",pars=pars)
aam.4 <- aam.mcl(x[!idx,],factor(y[!idx]),method=c("svm","randomForest"),pars=pars)

Run the code above in your browser using DataLab