create_marginal_data_training

Sample observations from the empirical distribution P(X) using the training dataset.

internal

Complex machine learning models are often hard to interpret. However, in
many situations it is crucial to understand and explain why a model made a specific
prediction. Shapley values is the only method for such prediction explanation framework
with a solid theoretical foundation. Previously known methods for estimating the Shapley
values do, however, assume feature independence. This package implements methods which accounts for any feature
dependence, and thereby produces more accurate estimates of the true Shapley values.
An accompanying 'Python' wrapper ('shaprpy') is available through the GitHub repository.

Martin Jullum

shapr

Prediction Explanation with Dependence-Aware Shapley Values

Lars Henry Berge Olsen

Annabelle Redelmeier

Jon Lachmann

Nikolai Sellereite

Anders Løland

Jens Christian Wahl

Camilla Lingjærde

Norsk Regnesentral 

create_marginal_data_training function

<dl><dt>x_train</dt>
<dd>Data.table with training data.</dd>
<dt>Sbar_features</dt>
<dd>Vector of integers containing the features indices to generate marginal observations for.
That is, if <code>Sbar_features</code> is <code>c(1,4)</code>, then we sample <code>n_MC_samples</code> observations from \(P(X_1, X_4)\) using the
empirical training observations (with replacements). That is, we sample the first and fourth feature values from
the same training observation, so we do not break the dependence between them.</dd>
<dt>stable_version</dt>
<dd>Logical. If <code>TRUE</code> and <code>n_MC_samples</code> &gt; <code>n_train</code>, then we include each training observation
<code>n_MC_samples %/% n_train</code> times and then sample the remaining <code>n_MC_samples %% n_train samples</code>. Only the latter is
done when <code>n_MC_samples &lt; n_train</code>. This is done separately for each explicand. If <code>FALSE</code>, we randomly sample the
from the observations.</dd></dl>

Arguments

Author

Function that samples data from the empirical marginal training distribution — create_marginal_data_training

<dl>

<dt>x_train</dt>
<dd>Data.table with training data.</dd>


<dt>Sbar_features</dt>
<dd>Vector of integers containing the features indices to generate marginal observations for.
That is, if <code>Sbar_features</code> is <code>c(1,4)</code>, then we sample <code>n_MC_samples</code> observations from \(P(X_1, X_4)\) using the
empirical training observations (with replacements). That is, we sample the first and fourth feature values from
the same training observation, so we do not break the dependence between them.</dd>


<dt>stable_version</dt>
<dd>Logical. If <code>TRUE</code> and <code>n_MC_samples</code> &gt; <code>n_train</code>, then we include each training observation
<code>n_MC_samples %/% n_train</code> times and then sample the remaining <code>n_MC_samples %% n_train samples</code>. Only the latter is
done when <code>n_MC_samples &lt; n_train</code>. This is done separately for each explicand. If <code>FALSE</code>, we randomly sample the
from the observations.</dd>

</dl>

Function that samples data from the empirical marginal training distribution

create_marginal_data_training: Function that samples data from the empirical marginal training distribution

Description

Usage

Value

Arguments

Author