calibrate: Calibration of probabilities according to the given prior.

Description

Given probability scores predictedProb as provided for example by a call to predict.CoreModel and using one of available methods given by methods the function calibrates predicted probabilities so that they match the actual probabilities of a binary class 1 provided by correctClass. The computed calibration can be applied to the scores returned by that model.

Usage

calibrate(correctClass, predictedProb, class1=1, 
          method = c("isoReg","binIsoReg","binning","mdlMerge"), 
          weight=NULL, noBins=10, assumeProbabilities=FALSE)
          
applyCalibration(predictedProb, calibration)

Arguments

correctClass

A vector of correct class labels for a binary classification problem.

predictedProb

A vector of predicted class 1 (probability) scores. In calibrate method it should be of the same length as correctClass.

class1

A class value (factor) or an index of the class value to be taken as a class to be calibrated.

method

One of isoReg, binIsoReg, binning, or mdlMerge. See details below.

weight

If specified, should be of the same length as correctClass and gives the weights for all the instances, otherwise a default weight of 1 for each instance is assumed.

noBins

The value of parameter depends on the parameter method and specifies desired or initial number of bins. See details below.

assumeProbabilities

If assumeProbabilities=TRUE the values in predictedProb are expected to be in [0,1] range i.e., probability estimates. assumeProbabilities=FALSE the algorithm can be used as ordinary (isotonic) regression

calibration

The list resulting from a call to calibration and subsequently applied to probability scores returned by the same model.

Value

A function returns a list with two vector components of the same length:

interval

The boundaries of the intervals. Lower boundary 0 is not explicitly included but should be taken into account.

calProb

The calibrated probabilities for each corresponding interval.

Details

Depending on the specified method one of the following calibration methods is executed.

"isoReg" isotonic regression calibration based on pair-adjacent violators (PAV) algorithm.
"binning" calibration into a pre-specified number of bands given by noBins parameter, trying to make bins of equal weight.
"binIsoReg" first binning method is executed, following by a isotonic regression calibration.
"mdlMerge" first intervals are merged by a MDL gain criterion into a prespecified number of intervals, following by the isotonic regression calibration.

If model="binning" the parameter noBins specifies the desired number of bins i.e., calibration bands; if model="binIsoReg" the parameter noBins specifies the number of initial bins that are formed by binning before isotonic regression is applied; if model="mdlMerge" the parameter noBins specifies the number of bins formed after first applying isotonic regression. The most similar bins are merged using MDL criterion.

References

I. Kononenko, M. Kukar: Machine Learning and Data Mining: Introduction to Principles and Algorithms. Horwood, 2007

A. Niculescu-Mizil, R. Caruana: Predicting Good Probabilities With Supervised Learning. Proceedings of the 22nd International Conference on Machine Learning (ICML'05), 2005

Examples

Run this code

# NOT RUN {
# generate data set separately for training the model, 
#   calibration of probabilities and testing
train <-classDataGen(noInst=200)
cal <-classDataGen(noInst=200)
test <- classDataGen(noInst=200)

# build random forests model with default parameters
modelRF <- CoreModel(class~., train, model="rf", maxThreads=1)

# prediction 
predCal <- predict(modelRF, cal, rfPredictClass=FALSE)
predTest <- predict(modelRF, test, rfPredictClass=FALSE)
destroyModels(modelRF) # clean up, model not needed anymore

# calibrate for a chosen class1 and method
class1<-1
calibration <- calibrate(cal$class, predCal$prob[,class1], class1=class1, 
                         method="isoReg",assumeProbabilities=TRUE)

# apply the calibration to the testing set
calibratedProbs <- applyCalibration(predTest$prob[,class1], calibration)
# the calibration of probabilities can be visualized with 
# reliabilityPlot function

# }

Run the code above in your browser using DataLab