classify: Classification model

Description

Build a classification model that predicts the algorithm to use based on the features of the problem.

Usage

classify(classifier = NULL, data = NULL,
    pre = function(x, y=NULL) { list(features=x) },
    save.models = NA, use.weights = TRUE)

Arguments

classifier

the mlr classifier to use. See examples.

The argument can also be a list of such classifiers.

data

the data to use with training and test sets. The structure returned by one of the partitioning functions.

pre

a function to preprocess the data. Currently only normalize. Optional. Does nothing by default.

save.models

Whether to serialize and save the models trained during evaluation of the model. If not NA, will be used as a prefix for the file name.

use.weights

Whether to use instance weights if supported. Default TRUE.

Value

predictions

a data frame with the predictions for each instance and test set. The columns of the data frame are the instance ID columns (as determined by input), the algorithm, the score of the algorithm, and the iteration (e.g. the number of the fold for cross-validation). More than one prediction may be made for each instance and iteration. The score corresponds to the number of classifiers that predicted the respective algorithm, or the sum of probabilities that this classifier was the best. If stacking is used, the score corresponds to the output of the stacked classifier.

predictor

a function that encapsulates the classifier learned on the entire data set. Can be called with data for the same features with the same feature names as the training data to obtain predictions in the same format as the predictions member.

models

the list of models trained on the entire data set. This is meant for debugging/inspection purposes and does not include any models used to combine predictions of individual models.

Details

classify takes the training and test sets in data and processes it using pre (if supplied). classifier is called to induce a classifier. The learned model is used to make predictions on the test set(s).

The evaluation across the training and test sets will be parallelized automatically if a suitable backend for parallel computation is loaded. The parallelMap level is "llama.fold".

If the given classifier supports case weights and use.weights is TRUE, the performance difference between the best and the worst algorithm is passed as a weight for each instance.

If a list of classifiers is supplied in classifier, ensemble classification is performed. That is, the models are trained and used to make predictions independently. For each instance, the final prediction is determined by majority vote of the predictions of the individual models -- the class that occurs most often is chosen. If the list given as classifier contains a member .combine that is a function, it is assumed to be a classifier with the same properties as the other ones and will be used to combine the ensemble predictions instead of majority voting. This classifier is passed the original features and the predictions of the classifiers in the ensemble.

If the prediction of a stacked learner is NA, the prediction will NA for the score.

If save.models is not NA, the models trained during evaluation are serialized into files. Each file contains a list with members model (the mlr model), train.data (the mlr task with the training data), and test.data (the data frame with the test data used to make predictions). The file name starts with save.models, followed by the ID of the machine learning model, followed by "combined" if the model combines predictions of other models, followed by the number of the fold. Each model for each fold is saved in a different file.

References

Kotthoff, L., Miguel, I., Nightingale, P. (2010) Ensemble Classification for Constraint Solver Configuration. 16th International Conference on Principles and Practices of Constraint Programming, 321--329.

Examples

Run this code

# NOT RUN {
if(Sys.getenv("RUN_EXPENSIVE") == "true") {
data(satsolvers)
folds = cvFolds(satsolvers)

res = classify(classifier=makeLearner("classif.J48"), data=folds)
# the total number of successes
sum(successes(folds, res))
# predictions on the entire data set
res$predictor(satsolvers$data[satsolvers$features])

res = classify(classifier=makeLearner("classif.svm"), data=folds)

# use probabilities instead of labels
res = classify(classifier=makeLearner("classif.randomForest", predict.type = "prob"), data=folds)

# ensemble classification
rese = classify(classifier=list(makeLearner("classif.J48"),
                                makeLearner("classif.IBk"),
                                makeLearner("classif.svm")),
                data=folds)

# ensemble classification with a classifier to combine predictions
rese = classify(classifier=list(makeLearner("classif.J48"),
                                makeLearner("classif.IBk"),
                                makeLearner("classif.svm"),
                                .combine=makeLearner("classif.J48")),
                data=folds)
}
# }

Run the code above in your browser using DataLab