Daim: Diagnostic accuracy of classification models.

Description

Estimation of misclassification rate, sensitivity, specificity and AUC based on cross-validation (CV) or various bootstrap techniques.

Usage

Daim(formula, model = NULL, data = NULL, control = Daim.control(),			 thres = seq(0, 1, by = 0.01), cutoff = 0.5, labpos = "1", returnSample = FALSE, cluster = NULL, seed.cluster = NULL, multicore = FALSE, ...)

Arguments

formula

formula of the form y ~ x1 + x2 + ..., where y must be a factor and x1,x2,... are numeric or factor.

model

function. Modelling technique whose error rate is to be estimated. The function model returns the predicted probability for each observation.

data

an optional data frame containing the variables in the model (training data).

control

See Daim.control.

thres

a numeric vector with the cutoff values.

cutoff

the cutoff value for error estimation. This can be a numeric value or a character string: "cv" - the optimal cut-point corresponding to cv estimation of the sensitivity and the specificity. "loob" - the optimal cut-point corresponding to loob estimation of the sensitivity and the specificity. "0.632" - the optimal cut-point corresponding to 0.632 estimation of the sensitivity and the specificity. "0.632+" - the optimal cut-point corresponding to 0.632+ estimation of the sensitivity and the specificity.

labpos

a character string of the response variable that defines a "positive" event. The labels of the "positive" events will be set to "pos" and others to "neg".

returnSample

a logical value for saving the data from each sample.

cluster

the name of the cluster, if parallel computing is used.

seed.cluster

an integer value used as seed for the RNG.

multicore

a logical indicating whether multiple cores (if available) should be used for the computations.

...

additional parameters passed to clusterApplyLB or mclapply.

Value

Daim-class.

References

Werner Adler and Berthold Lausen (2009). Bootstrap Estimated True and False Positive Rates and ROC Curve. Computational Statistics & Data Analysis, 53, (3), 718--729. Tom Fawcett (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, (8).

Bradley Efron and Robert Tibshirani (1997). Improvements on cross-validation: The.632+ bootstrap method. Journal of the American Statistical Association, 92, (438), 548--560.

Examples

Run this code


#############################
##      Evaluation of      ##
##           LDA           ##
#############################

library(TH.data)
library(MASS)
data(GlaucomaM)
head(GlaucomaM)

mylda <- function(formula, train, test){
  	model <- lda(formula, train)
  	predict(model, test)$posterior[,"pos"]
}
  
set.seed(1102013)
ACC <- Daim(Class~., model=mylda, data=GlaucomaM, labpos="glaucoma", 
            control=Daim.control(method="boot", number=50))
ACC
summary(ACC)

  
## Not run:   
# ## just because of checking time on CRAN
#   
#   
#   ####
#   #### optimal cut point determination
#   ####
#   
#   
#   set.seed(1102013)
#   ACC <- Daim(Class~., model=mylda, data=GlaucomaM, labpos="glaucoma", 
#               control=Daim.control(method="boot", number=50), cutoff="0.632+")
#   ACC
#   summary(ACC)
#   
#   
#   
#   ####
#   #### for parallel execution on multicore CPUs and computer clusters
#   ####
#   
#   library(parallel)
#   ### 
#   ### create cluster with two slave nodes
# 
#   cl <- makeCluster(2)
# 
#   ###
#   ### Load used package on all slaves and execute Daim in parallel
#   ###
# 
#   clusterEvalQ(cl, library(ipred))
#   ACC <- Daim(Class~., model=mylda, data=GlaucomaM, labpos="glaucoma", cluster=cl)
#   ACC
# 
# 
#   ####
#   #### for parallel computing on multicore CPUs
#   ####
# 
#   ACC <- Daim(Class~., model=mylda, data=GlaucomaM, labpos="glaucoma", multicore=TRUE)
#   ACC
#   
#   
#   
#   
#   
#   #############################
#   ##      Evaluation of      ##
#   ##      randomForrest      ##
#   #############################
#   
#   
#   library(randomForest)
# 
#   myRF <- function(formula, train, test){
#       model <- randomForest(formula, train)
#   	  predict(model,test,type="prob")[,"pos"]
#   }
# 
#   ACC2 <- Daim(Class~., model=myRF, data=GlaucomaM, labpos="glaucoma",
#                control=Daim.control(number=50))
#   ACC2
#   summary(ACC2)
#   
#   
#   ####
#   #### optimal cut point determination
#   ####
#   
#   
#   set.seed(1102013)
#   ACC2 <- Daim(Class~., model=myRF, data=GlaucomaM, labpos="glaucoma", 
#               control=Daim.control(method="boot", number=50), cutoff="0.632+")
#   summary(ACC2)
#   
#   
#   
#   ####
#   #### for parallel execution on multicore CPUs and computer clusters
#   ####
#   
#   
#   library(parallel)
#   ### 
#   ### create cluster with two slave nodes
# 
#   cl <- makeCluster(2)
# 
#   ###
#   ### Load used package on all slaves and execute Daim in parallel
#   ###
# 
#   clusterEvalQ(cl, library(randomForest))
#   ACC2 <- Daim(Class~., model=myRF, data=GlaucomaM, labpos="glaucoma", cluster=cl)
#   ACC2
# 
#   ####
#   #### for parallel computing on multicore CPUs
#   ####
# 
#   ACC2 <- Daim(Class~., model=myRF, data=GlaucomaM, labpos="glaucoma", multicore=TRUE)
#   ACC2
#   ## End(Not run)

Run the code above in your browser using DataLab