compareTreecalcs: Error rate comparisons for tree-based classification

Description

Compare error rates, between different functions and different selection rules, for an approximately equal random division of the data into a training and test set.

Usage

compareTreecalcs(x = yesno ~ ., data = spam7, cp = 0.00025,
                 fun = c("rpart", "randomForest"))

Arguments

model formula

data

an data frame in which to interpret the variables named in the formula

setting for the cost complexity parameter cp, used by rpart()

fun

one or both of "rpart" and "randomForest"

Value

If rpart is specified in fun, the following:
rpSEcvIthe estimated cross-validation error rate when rpart() is run on the training data (I), and the one-standard error rule is used
rpcvIthe estimated cross-validation error rate when rpart() is run on subset I, and the model used that gives the minimum cross-validated error rate
rpSEtestthe error rate when the model that leads to rpSEcvI is used to make predictions for subset II
rptestthe error rate when the model that leads to rpcvI is used to make predictions for subset II
nSErulenumber of splits required by the one standard error rule
nREminnumber of splits to give the minimum error
If rpart is specified in fun, the following:
rfcvIthe out-of-bag (OOB) error rate when randomForest() is run on subset I
rftestthe error rate when the model that leads to rfcvI is used to make predictions for subset II

Details

Data are randomly divided into two subsets, I and II. The function(s) are used in the standard way for calculations on subset I, and error rates returined that come from the calculations carried out by the function(s). Predictions are made for subset II, allowing the calculation of a completely independent set of error rates.