summary.xspliner: Summary method for xspliner object

Description

Summary method for xspliner object

Usage

# S3 method for xspliner
summary(object, predictor, ..., model = NULL,
  newdata = NULL, prediction_funs = list(function(object, newdata)
  predict(object, newdata)), env = parent.frame())

Arguments

object

xspliner object

predictor

predictor for xspliner model formula

...

Another arguments passed into model specific method.

model

Original black box model. Providing enables models comparison. See details.

newdata

Data used for models comparison. By default training data used for black box build.

prediction_funs

List of prediction functions for surrogate and black box model. For classification problem, different statistics are displayed based on predictions type. See details section for more info.

env

Environment in which newdata is stored (if not provided as parameter).

Details

The summary output depends strictly on data provided to it.

Standard output for providing only xspliner model (object parameter) return default glm::summary output.

Providing both xspliner model and predictor returns summary details for selecter variable. The following points decribe the rules:

When variable was quantitative and transformed with fitted spline, the output contain approximation details.
When variable was qualitative and transformed, factor matching is displayed.
When variable was not transformed, glm::summary output is displayed for the model.

If both object parameter and model (original black box) was provided, the summary displays comparison of original and surrogate model. The following points decribe the rules ($y_{s}$ and $y_{o}$ are predictions of surrogate and original model respectively on provided dataset). When comparing statistic is close to 1, this means surrogate model is similiar to black box one (according to this statistic).

For regression models:

1 - Maximum predictions normed-difference $$1 - \frac{\max_{i = 1}^{n} |y_{s}^{(i)} - y_{o}^{(i)}|}{\max_{i = 1}^{n} y_{o}^{(i)} - \min_{i = 1}^{n} y_{o}^{(i)}}$$
R^2 (https://christophm.github.io/interpretable-ml-book/global.html#theory-4) $$1 - \frac{\sum_{i = 1}^{n} ({y_{s}^{(i)} - y_{o}^{(i)}}) ^ {2}}{\sum_{i = 1}^{n} ({y_{o}^{(i)} - \overline{y_{o}}}) ^ {2}}$$
Mean square errors for each model.

For classification models the result depends on prediction type. When predictions are classified levels:

Mean predictions similarity$$\frac{1}{n} \sum_{i = 1}^{n} I_{y_{s}^{(i)} = y_{o}^{(i)}}$$
Accuracies for each models.

When predictions are response probabilities:

R^2 as for regression model.
1 - Maximum ROC difference$$1 - \max_{t \in T} ||ROC_{o}(t) - ROC_{s}(t)||_{2}$$ Calculates maximum of euclidean distances between ROC points for specified thresholds set T. In this imlplementation T is union of breakpoints for each ROC curve.
1 - Mean ROC difference Above version using mean instead of max measure.

Examples

Run this code

# NOT RUN {
library(randomForest)
set.seed(1)
data <- iris
# regression model
iris.rf <- randomForest(Petal.Width ~  Sepal.Length + Petal.Length + Species, data = data)
iris.xs <- xspline(iris.rf)
# Summary of quantitative variable transition
summary(iris.xs, "Sepal.Length")
# Summary of qualitative variable transition
summary(iris.xs, "Species")
# Comparing surrogate with original model (regression)
summary(iris.xs, model = iris.rf, newdata = data)

# Classification model

# }

Run the code above in your browser using DataLab