"Roc" (object, formula, data, splitMethod='noSplitMethod',
noinf.method=c('simulate'), simulate='reeval', B, M, breaks, cbRatio=1,
RocAverageMethod='vertical',
RocAverageGrid=switch(RocAverageMethod, 'vertical'=seq(0,1,.01),
'horizontal'=seq(1,0,-.01)), model.args=NULL, model.parms=NULL,
keepModels=FALSE, keepSampleIndex=FALSE, keepCrossValRes=FALSE,
keepNoInfSimu, slaveseed, cores=1, na.accept=0, verbose=FALSE, ...)
splitMethod
is 'bootcv') all the R objects in this
list must include a call
which can be evaluated in
a learning subset of the data.data
. If
missing, use the formula of the (first) model in object. The data set needs to have the same structure, variable
names, factor levels, etc., as the data in which the
models were trained. If the subjects in data were not
used to train the models given in object
, this
leads to an external validation situation.
However, note that if one of the elements in
object
is a formula then it is evaluated in this
data set.
none
:Assess the models in the data given by
data
. If this data set coincides with the train
data where the models were fitted this yields an apparent
(or re-substitution) estimate of performance. Otherwise,
this leads to an external validation situation.
bootCV
: Internal bootstrap cross validation. The
prediction models are trained on B
bootstrap
samples of the data
. Bootstrap samples are either
drawn with replacement from data
(same size), or
without replacement of the size M
where M
is a number smaller than NROW(data)
. The model
performance parameters (Roc, Brier, AUC) are estimated
with the observations that are NOT in the current
bootstrap sample.
boot632
: Linear combination of the apparent
performance and the BootCV performance using the constant
weight .632 (see Efron & Tibshirani, 1997).
boot632plus
: Linear combination of apparent
performance and Bootcv using weights dependent on how the
models perform in permuted data (see Efron & Tibshirani,
1997).
noinf
: Assess the models trained in permutations
of data
.
'reeval'
then the models are re-build in the
current permuted data for computing the no-information
Roc curve.splitMethod
: When splitMethod in
c('Bootcv','Boot632','Boot632plus')
it is the number of
bootstrap samples, default is 100. Otherwise it is
ignored.seq(0,1,.01)
when crossvalidation is
applied, i.e., when splitMethod in
c('Bootcv','Boot632','Boot632plus')
. Otherwise use all
unique values of the predictive marker.'horizontal'
average
crossvalidated specificities for fixed sensitivity
values, specified in RocAverageGrid
, otherwise, if
'vertical'
, average crossvalidated specificities
for fixed sensitivity values. See Fawcett, T. (2006) for
details.predictStatusProb
methods. The list
must have an entry for each entry in object
.object
. Each entry names parts of the
value of the fitted models that should be extracted and
added to the output (see value).FALSE
keep only the names of
the elements of object. If 'Call'
then keep the
call of the elements of object. Else, add the object as
it is to the output.FALSE
remove
the cross-validation index (which tells who was in the
learn and who in the validation set) from the output list
which otherwise is included in the method part of the
output list.TRUE
add all
B
crossvalidation results to the output (see
value). Defaults to TRUE
.TRUE
add the
B
results in permuted data (for no-information
performance) to the output (see value). Defaults to
FALSE
.B
, to
be given to the slaves in parallel computing to control
the models build in crossvalidation loop.mc.cores
when
calling mclapply
.B
.TRUE
the procedure is reporting
details of the progress, e.g. it prints the current step
in cross-validation procedures.Roc
or class Brier
.Depending on the splitMethod
the object includes the
following components:object
, estimated according to
splitMethod
.AppRoc
and the BootcvRoc
Only available if
splitMethod
is one of 'Boot632', or 'Boot632plus'. overfit
of the model(s).
Only if splitMethod
is one of 'Boot632', or
'Boot632plus'. Bootstrap-crossvalidation techniques are implemented to estimate the generalization performance of the model(s), i.e., the performance which can be expected in new subjects.
By default, when crossvalidation is involved, the ROC curve is approximated on a grid of either sensitivities or specificities and not computed at all unique changepoints of the crossvalidated ROC curves, see Fawcett, T. (2006). The (density of the) grid can be controlled with the argument: RocAverageGrid
Missing data in the response or in the marker/predicted risk cause a failure.
For each R object which potentially can predict a
probability for an event, there should be a corresponding
predictStatusProb
method:
For example, to assess a prediction model which evaluates
to a myclass
object one defines a function called
predictStatusProb.myclass
with arguments
object,newdata,...
. For example, the function
predictStatusProb.lrm looks like this:
predictStatusProb.lrm <- function(object,newdata,...) p <- as.numeric(predict(object,newdata=newdata,type='fitted')) class(p) <- 'predictStatusProb' p
Currently implemented are predictStatusProb
methods
for the following R-functions:
numeric
(marker values are passed on)formula
(single predictor: extracted from
newdata and passed on, multiple predictors: projected to
score by logistic regression)glm
(from
library(stats)
lrm
(from
library(Design)
rpart
(from
library(rpart)
)BinaryTree
(from
library(party)
)ElasticNet
(a
wrapper for glmnet from library(glmnet)
)randomForest
from
library(randomForest)
rfsrc
from
library(randomForestSRC)
Gerds, Cai & Schumacher (2008). The Performance of Risk Prediction Models. Biometrical Journal, Vol 50, 4, 457-479.
Efron, Tibshirani (1997) Journal of the American Statistical Association 92, 548--560 Improvement On Cross-Validation: The .632+ Bootstrap Method.
Wehberg, S and Schumacher, M (2004) A comparison of nonparametric error rate estimation methods in classification problems. Biometrical Journal, Vol 46, 35--47
## Generate some data with binary response Y
## depending on X1 and X2 and X1*X2
set.seed(40)
N <- 40
X1 <- rnorm(N)
X2 <- abs(rnorm(N,4))
X3 <- rbinom(N,1,.4)
expit <- function(x) exp(x)/(1+exp(x))
lp <- expit(-2 + X1 + X2 + X3 - X3*X2)
Y <- factor(rbinom(N,1,lp))
dat <- data.frame(Y=Y,X1=X1,X2=X2)
# single markers, one by one
r1 <- Roc(Y~X1,data=dat)
plot(r1,col=1)
r2 <- Roc(Y~X2,data=dat)
lines(r2,col=2)
# or, directly multiple in one
r12 <- Roc(list(Y~X1,Y~X2),data=dat)
plot(r12)
## compare logistic regression
lm1 <- glm(Y~X1,data=dat,family="binomial")
lm2 <- glm(Y~X1+X2,data=dat,family="binomial")
r1=Roc(list(LR.X1=lm1,LR.X1.X2=lm2))
summary(r1)
Brier(list(lm1,lm2))
# machine learning
library(randomForest)
dat$Y=factor(dat$Y)
rf <- randomForest(Y~X2,data=dat)
rocCV=Roc(list(RandomForest=rf,LogisticRegression=lm2),
data=dat,
splitMethod="bootcv",
B=3,
cbRatio=1)
plot(rocCV)
# compute .632+ estimate of Brier score
bs <- Brier(list(LR.X1=lm1,LR.X2=lm2),
data=dat,
splitMethod="boot632+",
B=3)
bs
#'
Run the code above in your browser using DataLab