lime: LIME

Description

lime() fits a locally weighted linear regression model (logistic for classification) to explain a single machine learning prediction.

Usage

lime(object, X, sample.size = 100, k = 3, x.interest, class = NULL, ...)

Arguments

object

The machine learning model. Different types are allowed. Recommended are mlr WrappedModel and caret train objects. The object can also be a function that predicts the outcome given features or anything with an S3 predict function, like an object from class lm.

data.frame with the data for the prediction model

sample.size

The number of instances to be sampled from X.

the (maximum) number of features to be used for the surrogate model

x.interest

data.frame with a single row for the instance to be explained.

class

In case of classification, class specifies the class for which to predict the probability. By default the multiclass classification is done.

...

Further arguments for the prediction method.

Value

A LIME object (R6). Its methods and variables can be accessed with the $-operator:

sample.size

The number of samples from data X. The higher the more accurate the explanations become.

model

the glmnet object.

best.fit.index

the index of the best glmnet fit

The number of features as set by the user.

x.interest

method to get/set the instance. See examples for usage.

data()

method to extract the results of the local feature effects Returns a data.frame with the feature names (feature) and contributions to the prediction

plot()

method to plot the LIME feature effects. See plot.LIME

predict()

method to predict new data with the local model See also predict.LIME

run()

[internal] method to run the interpretability method. Use obj$run(force = TRUE) to force a rerun.

General R6 methods

clone()

[internal] method to clone the R6 object.

initialize()

[internal] method to initialize the R6 object.

Details

Data points are sampled and weighted by their proximity to the instance to be explained. A weighted glm is fitted with the machine learning model prediction as target. L1-regularisation is used to make the results sparse. The resulting model can be seen as a surrogate for the machine learning model, which is only valid for that one point. Categorical features are binarized, depending on the category of the instance to be explained: 1 if the category is the same, 0 otherwise.

Differences to the original LIME implementation:

Distance measure: Uses gower proximity (= 1 - gower distance) instead of a kernel based on the Euclidean distance. Has the advantage to have a meaningful neighbourhood and no kernel width to tune.
Sampling: Sample from X instead of from normal distributions. Has the advantage to follow the original data distribution.
Visualisation: Plots effects instead of betas. Is the same for binary features, but makes a difference for numerical features. For numerical features, plotting the betas makes no sense, because a negative beta might still increase the prediction when the feature value is also negative.

To learn more about local models, read the Interpretable Machine Learning book: https://christophm.github.io/interpretable-ml-book/lime.html

References

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why Should I Trust You?": Explaining the Predictions of Any Classifier. Retrieved from http://arxiv.org/abs/1602.04938

Examples

Run this code

# NOT RUN {
# First we fit a machine learning model on the Boston housing data
library("randomForest")
data("Boston", package  = "MASS")
mod = randomForest(medv ~ ., data = Boston, ntree = 50)
X = Boston[-which(names(Boston) == "medv")]

# Then we explain the first instance of the dataset with the lime() method:
x.interest = X[1,]
lemon = lime(mod, X, x.interest = x.interest, k = 2)
lemon

# Look at the results in a table
lemon$data()
# Or as a plot
plot(lemon)

# Reuse the object with a new instance to explain
lemon$x.interest = X[2,]
plot(lemon)
  
# lime() also works with multiclass classification
library("randomForest")
mod = randomForest(Species ~ ., data= iris, ntree=50)
X = iris[-which(names(iris) == 'Species')]

# Then we explain the first instance of the dataset with the lime() method:
lemon = lime(mod, X, x.interest = X[1,], predict.args = list(type='prob'), k = 3)
lemon$data()
plot(lemon) 

# You can also focus on one class
lemon = lime(mod, X, x.interest = X[1,], class = 2, predict.args = list(type='prob'), k = 2)
lemon$data()
plot(lemon) 

# }

Run the code above in your browser using DataLab