sortinghat (version 0.1)

errorest_632plus: Calculates the .632+ Error Rate for a specified classifier given a data set.


For a given data matrix and its corresponding vector of labels, we calculate the .632+ error rate from Efron and Tibshirani (1997) for a given classifier.


errorest_632plus(x, y, train, classify,
    num_bootstraps = 50, apparent = NULL, loo_boot = NULL,


a matrix of n observations (rows) and p features (columns)
a vector of n class labels
a function that builds the classifier. (See details.)
a function that classifies observations from the constructed classifier from train. (See details.)
the number of bootstrap replications
the apparent error rate for the given classifier. If NULL, this argument is ignored. See Details.
the leave-one-out bootstrap error rate for the given classifier. If NULL, this argument is ignored. See Details.
additional arguments passed to the function specified in train.


  • the 632+ error rate estimate


To calculate the .632+ error rate, we compute the leave-one-out (LOO) bootstrap error rate and the apparent error rate. Then, we compute the 'relative overfitting rate' based on these values. Next, we compute the 'no-information error rate'. Finally, we compute the .632+ error rate estimator from these values.

The 'no-information error rate', $\gamma$, is the error rate of the classifier if the error rate if the feature vectors and the class labels were independent. For $K$ classes, we can estimate $\gamma$ by $$\hat{\gamma} = \sum_{k=1}^K p_k * (1 - q_k)$$, where $p_k$ is the observed proportion of responses for class $k$ and $q_k$ is the proportion of observations classified as class $k$.

To calculate the apparent error rate, we use the errorest_apparent function. Similarly, to calculate the LOO bootstrap (LOO-Boot) error rate, we use the errorest_loo_boot function. In some cases (e.g. simulation study) one, if not both, of these error rate estimators might already be computed. Hence, we allow the user to provide these values if they are already computed; by default, the arguments are NULL to indicate that they are unavailable.

We expect that the first two arguments of the classifier function given in train are x and y, corresponding to the data matrix and the vector of their labels. Additional arguments can be passed to the train function. The returned object should be a classifier that will be passed to the function given in the classify argument.

We stay with the usual R convention for the classify function. We expect that this function takes two arguments: 1. an object argument which contains the trained classifier returned from the function specified in train; and 2. a newdata argument which contains a matrix of observations to be classified -- the matrix should have rows corresponding to the individual observations and columns corresponding to the features (covariates).


Efron, Bradley and Tibshirani, Robert (1997), "Improvements on Cross-Validation: The .632+ Bootstrap Method," Journal of American Statistical Association, 92, 438, 548-560.


Run this code
iris_x <- data.matrix(iris[, -5])
iris_y <- iris[, 5]

# Because the \\code{classify} function returns multiples objects in a list,
# we provide a wrapper function that returns only the class labels.
lda_wrapper <- function(object, newdata) { predict(object, newdata)$class }

# We compute the apparent and LOO-Boot error rates up front to demonstrate
# that they can be computed before the \\code{errorest_632plus} function is called.

apparent <- errorest_apparent(x = iris_x, y = iris_y, train = MASS:::lda,
                              classify = lda_wrapper)
loo_boot <- errorest_loo_boot(x = iris_x, y = iris_y, train = MASS:::lda,
                              classify = lda_wrapper)

# Each of the following 3 calls should result in the same error rate.
# 1. The apparent error rate is provided, while the LOO-Boot must be computed.
errorest_632plus(x = iris_x, y = iris_y, train = MASS:::lda,
                 classify = lda_wrapper, apparent = apparent)
# 2. The LOO-Boot error rate is provided, while the apparent must be computed.
errorest_632plus(x = iris_x, y = iris_y, train = MASS:::lda,
                 classify = lda_wrapper, loo_boot = loo_boot)
# 3. Both error rates are provided, so the calculation is quick.
errorest_632plus(x = iris_x, y = iris_y, train = MASS:::lda,
                 classify = lda_wrapper, apparent = apparent,
                 loo_boot = loo_boot)

# In each case the output is: 0.02194472

Run the code above in your browser using DataLab