Estimates misclassification errors (generalisation errors), sensitivity and specificity using cross-validation,
bootstrap and 632plus
bias corrected bootstrap methods based on Random Forest,
Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour methods.
# S3 method for data.frame
classificationError(
formula,
data,
method=c("RF","SVM","LDA","KNN"),
errorType = c("cv", "boot", "six32plus"),
senSpec=TRUE,
negLevLowest=TRUE,
na.action=na.omit,
control=control.errorest(k=NROW(na.action(data)),nboot=100),
...)
A formula of the form lhs ~ rhs
relating response (class)
variable and the explanatory variables. See lm
for
more detail.
A data frame containing the response (class membership) variable and the explanatory variables in the formula.
A character vector of length 1
to 4
representing the classification
methods to be used. Can be one or more of "RF"
(Random Forest), "SVM"
(Support Vector Machines), "LDA"
(Linear Discriminant Analysis) and "KNN"
(k-Nearest Neighbour). Defaults to all four methods.
A character vector of length 1
to 3
representing the type of
estimators to be used for computing misclassification errors.
Can be one or more of the "cv"
(cross-validation), "boot"
(bootstrap) and "632plus"
(632plus bias corrected bootstrap) estimators.
Defaults to all three estimators.
Logical. Should sensitivity and specificity (for cross-validation estimator only)
be computed? Defaults to TRUE
.
Logical. Is the lowest of the ordered levels of the class variable represnts
the negative control? Defaults to TRUE
.
Function which indicates what should happen when the data
contains NA
's, defaults to na.omit
.
Control parameters of the the function errorest
.
additional parameters to method
.
Returns an object of class classificationError
with components
The call of the classificationError
function.
A length(errorType)
by length(method)
matrix of classification errors.
A 2
by length(method)
matrix of
sensitivities (first row) and specificities (second row).
In the current version of the package, estimation of sensitivity and
specificity is limited to cross-validation estimator only. For LDA
sample size must be greater than the number of explanatory variables to
avoid singularity. The function classificationError
does not
check if this is satisfied, but the underlying function
lda
produces warnings if this condition is violated.
Khondoker, M. R., Till T. Bachmann, T. T., Mewissen, M., Dickinson, P. et al.(2010). Multi-factorial analysis of class prediction error: estimating optimal number of biomarkers for various classification rules. Journal of Bioinformatics and Computational Biology, 8, 945-965.
Breiman, L. (2001). Random Forests, Machine Learning 45(1), 5--32.
Chang, Chih-Chung and Lin, Chih-Jen: LIBSVM: a library for Support Vector Machines, https://www.csie.ntu.edu.tw/~cjlin/libsvm/.
Ripley, B. D. (1996). Pattern Recognition and Neural Networks.Cambridge: Cambridge University Press.
Efron, B. and Tibshirani, R. (1997). Improvements on Cross-Validation: The .632+ Bootstrap Estimator. Journal of the American Statistical Association 92(438), 548--560.
# NOT RUN {
# }
# NOT RUN {
mydata<-simData(nTrain=30,nBiom=3)$data
classificationError(formula=class~., data=mydata)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab