shapley: Explain predictions

Description

shapley() computes feature contributions for single predictions with the Shapley value, an approach from cooperative game theory approach. The features values of an instance cooperate to achieve the prediction. shapley() fairly distributes the difference of the instance's prediction and the datasets average prediction among the features. A features contribution can be negative.

Usage

shapley(object, X, x.interest, sample.size = 100, class = NULL, ...)

Arguments

object

The machine learning model. Different types are allowed. Recommended are mlr WrappedModel and caret train objects. The object can also be a function that predicts the outcome given features or anything with an S3 predict function, like an object from class lm.

data.frame with the data for the prediction model

x.interest

data.frame with a single row for the instance to be explained.

sample.size

Number of samples to be drawn to estimate the Shapley value. The higher the more accurate the estimations.

class

In case of classification, class specifies the class for which to predict the probability. By default the multiclass classification is done.

...

Further arguments for the prediction method.

Value

A Shapley object (R6). Its methods and variables can be accessed with the $-operator:

sample.size

The number of times coalitions/marginals are sampled from data X. The higher the more accurate the explanations become.

x.interest

data.frame with the instance of interest

y.hat.interest

predicted value for instance of interest

y.hat.averate

average predicted value for data X

method to get/set the instance. See examples for usage.

data()

method to extract the results of the shapley estimations. Returns a data.frame with the feature names (feature) and contributions to the prediction (phi)

plot()

method to plot the Shapley value. See plot.Shapley

run()

[internal] method to run the interpretability method. Use obj$run(force = TRUE) to force a rerun.

General R6 methods

clone()

[internal] method to clone the R6 object.

initialize()

[internal] method to initialize the R6 object.

Details

See TODO: BOOK REFERENCE

References

Strumbelj, E., Kononenko, I. (2014). Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 41(3), 647-665. https://doi.org/10.1007/s10115-013-0679-x

Examples

Run this code

# NOT RUN {
# First we fit a machine learning model on the Boston housing data
library("randomForest")
data("Boston", package  = "MASS")
mod = randomForest(medv ~ ., data = Boston, ntree = 50)
X = Boston[-which(names(Boston) == "medv")]

# Then we explain the first instance of the dataset with the shapley() method:
x.interest = X[1,]
shap = shapley(mod, X, x.interest = x.interest)
shap

# Look at the results in a table
shap$data()
# Or as a plot
plot(shap)

# shapley() also works with multiclass classification
library("randomForest")
mod = randomForest(Species ~ ., data= iris, ntree=50)
X = iris[-which(names(iris) == 'Species')]

# Then we explain the first instance of the dataset with the shapley() method:
shap = shapley(mod, X, x.interest = X[1,], predict.args = list(type='prob'))
shap$data()
plot(shap) 

# You can also focus on one class
shap = shapley(mod, X, x.interest = X[1,], class = 2, predict.args = list(type='prob'))
shap$data()
plot(shap) 

# }

Run the code above in your browser using DataLab