training_test_comparison: Compare performance of model between training and test set

Description

Function training_test_comparison calculates performance of the provided model based on specified measure function. Response of the model is caluclated based on test data, extracted from the explainer and training data, provided by the user. Output can be easily shown with print or plot function.

Usage

training_test_comparison(
  champion,
  challengers,
  training_data,
  training_y,
  measure_function = DALEX::loss_root_mean_square
)

Arguments

champion

- explainer of champion model.

challengers

- explainer of challenger model or list of explainers.

training_data

- data without target column that will be passed to predict function and then to measure function. Keep in mind that they have to differ from data passed to an explainer.

training_y

- target column for training_data

measure_function

- measure function that calculates performance of model based on true observation and prediction. Order of parameters is important and should be (y, y_hat). By default it is RMSE.

Value

An object of the class training_test_comparison.

It is a named list containig:

data data.frame with following columns
- measure_test performance on test set
- measure_train performance on training set
- label label of explainer
- type flag that indicates if explainer was passed as champion or as challenger.
models_info data.frame containig inforamtion about models used in analysys

Examples

Run this code

# NOT RUN {
library(DALEXtra)
titanic_train <- read.csv(system.file("extdata", "titanic_train.csv", package = "DALEXtra"))
titanic_test <- read.csv(system.file("extdata", "titanic_test.csv", package = "DALEXtra"))
h2o::h2o.init()
h2o::h2o.no_progress()
titanic_h2o <- h2o::as.h2o(titanic_train)
titanic_h2o["survived"] <- h2o::as.factor(titanic_h2o["survived"])
titanic_test_h2o <- h2o::as.h2o(titanic_test)
model <- h2o::h2o.gbm(
  training_frame = titanic_h2o,
  y = "survived",
  distribution = "bernoulli",
  ntrees = 500,
  max_depth = 4,
  min_rows =  12,
  learn_rate = 0.001
)
explainer_h2o <- explain_h2o(model, titanic_test[,1:17], titanic_test[,18])

explainer_scikit <- explain_scikitlearn(system.file("extdata",
                                                    "scikitlearn.pkl",
                                                    package = "DALEXtra"),
                                        yml = system.file("extdata",
                                                          "testing_environment.yml",
                                                          package = "DALEXtra"),
                                        data = titanic_test[,1:17],
                                        y = titanic_test$survived)

library("mlr")
task <- mlr::makeClassifTask(
  id = "R",
  data = titanic_train,
  target = "survived"
)
learner <- mlr::makeLearner(
  "classif.gbm",
  par.vals = list(
    distribution = "bernoulli",
    n.trees = 500,
    interaction.depth = 4,
    n.minobsinnode = 12,
    shrinkage = 0.001,
    bag.fraction = 0.5,
    train.fraction = 1
  ),
  predict.type = "prob"
)
gbm <- mlr::train(learner, task)
explainer_mlr <- explain_mlr(gbm, titanic_test[,1:17], titanic_test[,18])

data <- training_test_comparison(explainer_scikit, list(explainer_h2o, explainer_mlr),
                                 training_data = titanic_train[,-18],
                                 training_y = titanic_train[,18])
plot(data)
# }

Run the code above in your browser using DataLab