regression: Regression model

Description

Build a regression model that predicts the algorithm to use based on the features of the problem.

Usage

regression(regressor = NULL, data = NULL,
    pre = function(x, y=NULL) { list(features=x) },
    combine = NULL, expand = identity)

Arguments

regressor

the regression function to use. Must accept a formula of the values to predict and a data frame with features. Return value should be a structure that can be given to predict along with new data. See examples.

data

the data to use with training and test sets. The structure returned by trainTest or cvFolds.

pre

a function to preprocess the data. Currently only normalize. Optional. Does nothing by default.

combine

the function used to combine the predictions of the individual regression models for stacking. Default NULL. See details.

expand

a function that takes a matrix of performance predictions (columns are algorithms, rows problem instances) and transforms it into a matrix with the same number of rows. Only meaningful if combine is not null. Default is the identity fun

Value

predictionsa list of lists of data frames with the predictions for each test set. Each data frame has columns algorithm and score and is sorted according to preference, with the most preferred algorithm first. The score value corresponds to the predicted performance value. If stacking is used, each data frame contains simply the best algorithm with a score of 1.
predictora function that encapsulates the regression model learned on the entire data set. Can be called with data for the same features with the same feature names as the training data to obtain predictions.
modelsthe list of models trained on the entire data set. This is meant for debugging/inspection purposes and does not include any models used to combine predictions of individual models.

Details

regression takes data and processes it using pre (if supplied). regressor is called to induce separate regression models for each of the algorithms to predict its performance. The best algorithm is determined from the predicted performances by examining whether performance is to be minimized or not, as specified when creating the data structure though input.

The evaluation across the training and test sets will be parallelized automatically if a suitable backend for parallel computation is loaded.

If combine is not null, it is assumed to be a classifier with the same properties as classifiers given to classify and will be used to learn a model to predict the best algorithm given the performance predictions for the individual algorithms. Optionally, expand can be used to supply a function that will modify the features given to the classifier, e.g. augment the performance predictions with the pairwise differences (see examples).

References

Kotthoff, L. (2012) Hybrid Regression-Classification Models for Algorithm Selection. 20th European Conference on Artificial Intelligence, 480--485.

Examples

Run this code

data(satsolvers)
trainTest = cvFolds(satsolvers)

res = regression(regressor=lm, data=trainTest)
# the total number of successes
sum(successes(trainTest, res))
# predictions on the entire data set
res$predictor(subset(satsolvers$data, TRUE, satsolvers$features))

library(RWeka)
res = regression(regressor=LinearRegression, data=trainTest)

# combine performance predictions using classifier
ress = regression(regressor=LinearRegression, data=trainTest, combine=J48)

# add pairwise differences to performance predictions before running classifier
ress = regression(regressor=LinearRegression, data=trainTest, combine=J48,
    expand=function(x) { cbind(x, combn(c(1:ncol(x)), 2,
        function(y) { abs(x[,y[1]] - x[,y[2]]) })) })

Run the code above in your browser using DataLab