Learn R Programming

rfUtilities (version 2.1-4)

probability.calibration: Isotonic probability calibration

Description

Performs an isotonic regression calibration of posterior probability to minimize log loss.

Usage

probability.calibration(y, p, regularization = FALSE)

Arguments

y

Binomial response variable used to fit model

p

Estimated probabilities from fit model

regularization

(FALSE/TRUE) should regularization be performed on the probabilities? (see notes)

Value

a vector of calibrated probabilities

References

Platt, J. (1999) Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. Advances in Large Margin Classifiers (pp 61-74).

Niculescu-Mizil, A., & R. Caruana (2005) Obtaining calibrated probabilities from boosting. Proc. 21th Conference on Uncertainty in Artificial Intelligence (UAI 2005). AUAI Press.

Examples

Run this code
# NOT RUN {
 library(randomForest)
   data(iris)
   iris$Species <- ifelse( iris$Species == "versicolor", 1, 0 ) 
   
   # Add some noise
   idx1 <- which(iris$Species %in% 1)
   idx0 <- which( iris$Species %in% 0)
   iris$Species[sample(idx1, 2)] <- 0
   iris$Species[sample(idx0, 2)] <- 1
 
 # Specify model  
 y = iris[,"Species"] 
 x = iris[,1:4]
 set.seed(4364)  
 ( rf.mdl <- randomForest(x=x, y=factor(y)) )
 y.hat <- predict(rf.mdl, iris[,1:4], type="prob")[,2] 
 
 # Calibrate probabilities
 calibrated.y.hat <- probability.calibration(y, y.hat, regularization = TRUE) 

 # Plot calibrated against original probability estimate
 plot(density(y.hat), col="red", xlim=c(0,1), ylab="Density", xlab="probabilities",
      main="Calibrated probabilities" )
        lines(density(calibrated.y.hat), col="blue")
          legend("topright", legend=c("original","calibrated"), 
 	            lty = c(1,1), col=c("red","blue"))
  
# }

Run the code above in your browser using DataLab