Learn R Programming

mt (version 2.0-1.20)

accest: Estimate Classification Accuracy By Resampling Method

Description

Estimate classification accuracy rate by resampling method.

Usage

accest(dat, ...)

# S3 method for default accest(dat, cl, method, pred.func=predict,pars = valipars(), tr.idx = NULL, ...)

# S3 method for formula accest(formula, data = NULL, ..., subset, na.action = na.omit)

aam.cl(x,y,method, pars = valipars(),...)

aam.mcl(x,y,method, pars = valipars(),...)

Value

accest returns an object including the components:

method

Classification method used.

acc

Overall accuracy rate.

acc.iter

Average accuracy rate for each iteration.

acc.all

Accuracy rate for each iteration and replication.

auc

Overall area under receiver operating curve (AUC).

auc.iter

Average AUC for each iteration.

auc.all

AUC for each iteration and replication.

mar

Overall prediction margin.

mar.iter

Average prediction margin for each iteration.

mar.all

Prediction margin for each iteration and replication.

err

Overall error rate.

err.iter

Average error rate for each iteration.

err.all

Error rate for each iteration and replication.

sampling

Sampling scheme used.

niter

Number of iteration.

nreps

Number of replications in each iteration if resampling is not loocv.

conf

Overall confusion matrix.

res.all

All results which can be further processed.

acc.boot

A list of bootstrap accuracy such as .632 and .632+ if the resampling method is bootstrap.

aam.cl returns a vector with acc (accuracy),

auc(area under ROC curve) and mar(class margin).

aam.mcl returns a matrix with columns of acc (accuracy),

auc(area under ROC curve) and mar(class margin).

Arguments

formula

A formula of the form groups ~ x1 + x2 + ... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.

data

Data frame from which variables specified in formula are preferentially to be taken.

dat,x

A matrix or data frame containing the explanatory variables if no formula is given as the principal argument.

cl,y

A factor specifying the class for each observation if no formula principal argument is given.

method

Classification method whose accuracy rate is to be estimated, such as randomForest, svm, knn and lda. For details, see note below. Either a function or a character string naming the function to be called.

pred.func

Predict method (default is predict). Either a function or a character string naming the function to be called.

pars

A list of parameters using by the resampling method such as Leave-one-out cross-validation, Cross-validation, Bootstrap and Randomised validation (holdout). See valipars for details.

tr.idx

User defined index of training samples. Can be generated by trainind.

...

Additional parameters to method.

subset

Optional vector, specifying a subset of observations to be used.

na.action

Function which indicates what should happen when the data contains NA's, defaults to na.omit.

Author

Wanchang Lin

Details

The accuracy rates of classification are estimated by techniques such as Random Forest, Support Vector Machine, k-Nearest Neighbour Classification and Linear Discriminant Analysis based on resampling methods, including Leave-one-out cross-validation, Cross-validation, Bootstrap and Randomised validation (holdout).

See Also

binest, maccest, valipars, trainind, classifier

Examples

Run this code
# Iris data
data(iris)
# Use KNN classifier and bootstrap for resampling
acc <- accest(Species~., data = iris, method = "knn",
              pars = valipars(sampling = "boot",niter = 2, nreps=5))
acc
summary(acc)
acc$acc.boot

# alternatively the traditional interface:
x <- subset(iris, select = -Species)
y <- iris$Species

## -----------------------------------------------------------------------
# Random Forest with 5-fold stratified cv 
pars   <- valipars(sampling = "cv",niter = 4, nreps=5, strat=TRUE)
tr.idx <- trainind(y,pars=pars)
acc1   <- accest(x, y, method = "randomForest", pars = pars, tr.idx=tr.idx)
acc1
summary(acc1)
# plot the accuracy in each iteration
plot(acc1)

## -----------------------------------------------------------------------
# Forensic Glass data in chap.12 of MASS
data(fgl, package = "MASS")    # in MASS package
# Randomised validation (holdout) of SVM for fgl data
acc2 <- accest(type~., data = fgl, method = "svm", cost = 100, gamma = 1, 
              pars = valipars(sampling = "rand",niter = 10, nreps=4,div = 2/3) )
              
acc2
summary(acc2)
# plot the accuracy in each iteration
plot(acc2)

## -----------------------------------------------------------------------
## Examples of amm.cl and aam.mcl
aam.1 <- aam.cl(x,y,method="svm",pars=pars)
aam.2 <- aam.mcl(x,y,method=c("svm","randomForest"),pars=pars)

## If use two classes, AUC will be calculated
idx <- (y == "setosa")
aam.3 <- aam.cl(x[!idx,],factor(y[!idx]),method="svm",pars=pars)
aam.4 <- aam.mcl(x[!idx,],factor(y[!idx]),method=c("svm","randomForest"),pars=pars)

Run the code above in your browser using DataLab