input: Read data

Description

Reads performance data that can be used to train and evaluate models.

Usage

input(features, performances, successes = NULL, costs = NULL, minimize = T)

Arguments

features

data frame that contains the feature values for each problem instance and a non-empty set of ID columns.

performances

data frame that contains the performance values for each problem instance and a non-empty set of ID columns.

successes

data frame that contains the success values (true/false) for each algorithm on each problem instance and a non-empty set of ID columns. The names of the columns in this data set should be the same as the names of the columns in performances

costs

either a single number, a data frame or a list that specifies the cost of the features. If a number is specified, it is assumed to denote the cost for all problem instances (i.e. the cost is always the same). If a data frame is given, it is assumed

minimize

whether the minimum performance value is best. Default true.

Value

datathe combined data (features, performance, successes).
featuresa list of names denoting problem features.
performancea list of names denoting algorithm performances.
successa list of names denoting algorithm successes.
minimizetrue if the smaller performance values are better, else false.
costa list of names denoting feature costs.
costGroupsa list of list of names denoting which features belong to which group. Only returned if cost groups are given as input.

Details

input takes a list of data frames and processes them as follows. The feature and performance data are joined by looking for common column names in the two data frames (usually an ID of the problem instance). For each problem, the best algorithm according to the given performance data is computed. If more than one algorithm has the best performance, all of them are returned.

The data frame that describes whether an algorithm was successful on a problem is optional. If parscores or successes are to be used to evaluate the learned models, this argument is required however and will lead to error messages if not supplied.

Similarly, feature costs are optional.

Examples

Run this code

# features.csv looks something like
# ID,width,height
# 0,1.2,3
# ...
# performance.csv:
# ID,alg1,alg2
# 0,2,5
# ...
# success.csv:
# ID,alg1,alg2
# 0,T,F
# ...
input(read.csv("features.csv"), read.csv("performance.csv"),
    read.csv("success.csv"), costs=10)

# costs.csv:
# ID,width,height
# 0,3,4.5
# ...
input(read.csv("features.csv"), read.csv("performance.csv"),
    read.csv("success.csv"), costs=read.csv("costs.csv"))

# costGroups.csv:
# ID,group1,group2
# 0,3,4.5
# ...
input(read.csv("features.csv"), read.csv("performance.csv"),
    read.csv("success.csv"),
    costs=list(groups=list(group1=c("height"), group2=c("width")),
               values=read.csv("costGroups.csv")))

Run the code above in your browser using DataLab