classify: Classification model

Description

Build a classification model that predicts the algorithm to use based on the features of the problem.

Usage

classify(classifier = NULL, data = NULL,
    pre = function(x, y=NULL) { list(features=x) })

Arguments

classifier

the classifier function to use. Must accept a formula of the values to predict and a data frame with features. Return value should be a structure that can be given to predict along with new data. See examples.

The argument can also be

data

the data to use with training and test sets. The structure returned by trainTest or cvFolds.

pre

a function to preprocess the data. Currently only normalize. Optional. Does nothing by default.

Value

predictionsa list of lists of data frames with the predictions for each test set. Each data frame has columns algorithm and score and is sorted according to preference, with the most preferred algorithm first. The score corresponds to the number of classifiers that predicted the respective algorithm. If stacking is used, each data frame contains simply the best algorithm with a score of 1.
predictora function that encapsulates the classifier learned on the entire data set. Can be called with data for the same features with the same feature names as the training data to obtain predictions.
modelsthe list of models trained on the entire data set. This is meant for debugging/inspection purposes and does not include any models used to combine predictions of individual models.

Details

classify takes the training and test sets in data and processes it using pre (if supplied). classifier is called to induce a classifier. The learned model is used to make predictions on the test set(s).

The evaluation across the training and test sets will be parallelized automatically if a suitable backend for parallel computation is loaded.

If a list of classifiers is supplied in classifier, ensemble classification is performed. That is, the models are trained and used to make predictions independently. For each instance, the final prediction is determined by majority vote of the predictions of the individual models -- the class that occurs most often is chosen. If the list given as classifier contains a member .combine that is a function, it is assumed to be a classifier with the same properties as the other ones and will be used to combine the ensemble predictions instead of majority voting.

References

Kotthoff, L., Miguel, I., Nightingale, P. (2010) Ensemble Classification for Constraint Solver Configuration. 16th International Conference on Principles and Practices of Constraint Programming, 321--329.

Examples

Run this code

data(satsolvers)
trainTest = cvFolds(satsolvers)

library(RWeka)
res = classify(classifier=J48, data=trainTest)
# the total number of successes
sum(successes(trainTest, res))
# predictions on the entire data set
res$predictor(subset(satsolvers$data, TRUE, satsolvers$features))

library(e1071)
res = classify(classifier=svm, data=trainTest)

# ensemble classification
rese = classify(classifier=list(J48, IBk, svm), data=trainTest)

# ensemble classification with a classifier to combine predictions
rese = classify(classifier=list(J48, IBk, svm, .combine=J48), data=trainTest)

Run the code above in your browser using DataLab