The breakdown algorithm works as follows: First, the visit order
\((x_1, ..., x_m)\) of the variables v is specified.
Then, in the query data, the column \(x_1\) is set to the value of \(x_1\)
of the single observation new_obs to be explained.
The change in the (weighted) average prediction on data measures the
contribution of \(x_1\) on the prediction of new_obs.
This procedure is iterated over all \(x_i\) until eventually, all rows
in data are identical to new_obs.
A complication with this approach is that the visit order is relevant,
at least for non-additive models. Ideally, the algorithm could be repeated
for all possible permutations of v and its results averaged per variable.
This is basically what SHAP values do, see the reference below for an explanation.
Unfortunately, there is no efficient way to do this in a model agnostic way.
We offer two visit strategies to approximate SHAP:
"importance": Using the short-cut described in the reference below:
The variables are sorted by the size of their contribution in the same way as the
breakdown algorithm but without iteration, i.e., starting from the original query
data for each variable \(x_i\).
"permutation": Averages contributions from a small number of random permutations
of v.
Note that the minimum required elements in the (multi-)flashlight are a
"predict_function", "model", and "data". The latter can also directly be passed to
light_breakdown(). Note that by default, no retransformation function is applied.