regression: Regression model

Description

Build a regression model that predicts the algorithm to use based on the features of the problem.

Usage

regression(regressor = NULL, data = NULL,
    pre = function(x, y=NULL) { list(features=x) },
    combine = NULL, expand = identity, save.models = NA,
    use.weights = TRUE)

Arguments

regressor

the mlr regressor to use. See examples.

data

the data to use with training and test sets. The structure returned by one of the partitioning functions.

pre

a function to preprocess the data. Currently only normalize. Optional. Does nothing by default.

combine

the function used to combine the predictions of the individual regression models for stacking. Default NULL. See details.

expand

a function that takes a matrix of performance predictions (columns are algorithms, rows problem instances) and transforms it into a matrix with the same number of rows. Only meaningful if combine is not null. Default is the identity function, which will leave the matrix unchanged. See examples.

save.models

Whether to serialize and save the models trained during evaluation of the model. If not NA, will be used as a prefix for the file name.

use.weights

Whether to use instance weights if supported. Default TRUE.

Value

predictions

a data frame with the predictions for each instance and test set. The columns of the data frame are the instance ID columns (as determined by input), the algorithm, the score of the algorithm, and the iteration (e.g. the number of the fold for cross-validation). More than one prediction may be made for each instance and iteration. The score corresponds to the predicted performance value. If stacking is used, each prediction is simply the best algorithm with a score of 1.

predictor

a function that encapsulates the regression model learned on the entire data set. Can be called with data for the same features with the same feature names as the training data to obtain predictions in the same format as the predictions member.

models

the list of models trained on the entire data set. This is meant for debugging/inspection purposes and does not include any models used to combine predictions of individual models.

Details

regression takes data and processes it using pre (if supplied). regressor is called to induce separate regression models for each of the algorithms to predict its performance. The best algorithm is determined from the predicted performances by examining whether performance is to be minimized or not, as specified when creating the data structure though input.

The evaluation across the training and test sets will be parallelized automatically if a suitable backend for parallel computation is loaded. The parallelMap level is "llama.fold".

If combine is not null, it is assumed to be an mlr classifier and will be used to learn a model to predict the best algorithm given the original features and the performance predictions for the individual algorithms. If this classifier supports weights and use.weights is TRUE, they will be passed as the difference between the best and the worst algorithm. Optionally, expand can be used to supply a function that will modify the predictions before giving them to the classifier, e.g. augment the performance predictions with the pairwise differences (see examples).

If all predictions of an underlying machine learning model are NA, the prediction will be NA for the algorithm and -Inf for the score if the performance value is to be maximised, Inf otherwise.

If save.models is not NA, the models trained during evaluation are serialized into files. Each file contains a list with members model (the mlr model), train.data (the mlr task with the training data), and test.data (the data frame with the test data used to make predictions). The file name starts with save.models, followed by the ID of the machine learning model, followed by "combined" if the model combines predictions of other models, followed by the number of the fold. Each model for each fold is saved in a different file.

References

Kotthoff, L. (2012) Hybrid Regression-Classification Models for Algorithm Selection. 20th European Conference on Artificial Intelligence, 480--485.

Examples

Run this code

# NOT RUN {
if(Sys.getenv("RUN_EXPENSIVE") == "true") {
data(satsolvers)
folds = cvFolds(satsolvers)

res = regression(regressor=makeLearner("regr.lm"), data=folds)
# the total number of successes
sum(successes(folds, res))
# predictions on the entire data set
res$predictor(satsolvers$data[satsolvers$features])

res = regression(regressor=makeLearner("regr.ksvm"), data=folds)

# combine performance predictions using classifier
ress = regression(regressor=makeLearner("regr.ksvm"),
                  data=folds,
                  combine=makeLearner("classif.J48"))

# add pairwise differences to performance predictions before running classifier
ress = regression(regressor=makeLearner("regr.ksvm"),
                  data=folds,
                  combine=makeLearner("classif.J48"),
                  expand=function(x) { cbind(x, combn(c(1:ncol(x)), 2,
                         function(y) { abs(x[,y[1]] - x[,y[2]]) })) })
}
# }

Run the code above in your browser using DataLab