Following Friedman and Popescu (2008), if there are no interaction effects between
features \(x_j\) and \(x_k\), their two-dimensional (centered) partial dependence
function \(F_{jk}\) can be written as the sum of the (centered) univariate partial
dependencies \(F_j\) and \(F_k\), i.e.,
$$
F_{jk}(x_j, x_k) = F_j(x_j)+ F_k(x_k).
$$
Correspondingly, Friedman and Popescu's statistic of pairwise
interaction strength between \(x_j\) and \(x_k\) is defined as
$$
H_{jk}^2 = \frac{A_{jk}}{\frac{1}{n} \sum_{i = 1}^n\big[\hat F_{jk}(x_{ij}, x_{ik})\big]^2},
$$
where
$$
A_{jk} = \frac{1}{n} \sum_{i = 1}^n\big[\hat F_{jk}(x_{ij}, x_{ik}) -
\hat F_j(x_{ij}) - \hat F_k(x_{ik})\big]^2
$$
(check partial_dep() for all definitions).
Remarks:
Remarks 1 to 5 of h2_overall() also apply here.
\(H^2_{jk} = 0\) means there are no interaction effects between \(x_j\)
and \(x_k\). The larger the value, the more of the joint effect of the two
features comes from the interaction.
Since the denominator differs between variable pairs, unlike \(H_j\),
this test statistic is difficult to compare between variable pairs.
If both main effects are very weak, a negligible interaction can get a
high \(H^2_{jk}\). Therefore, Friedman and Popescu (2008) suggests to calculate
\(H^2_{jk}\) only for important variables (see "Modification" below).
Modification
To be better able to compare pairwise interaction strength across variable pairs,
and to overcome the problem mentioned in the last remark, we suggest as alternative
the unnormalized test statistic on the scale of the predictions,
i.e., \(\sqrt{A_{jk}}\). Set normalize = FALSE and squared = FALSE to obtain
this statistic.
Furthermore, instead of focusing on pairwise calculations for the most important
features, we can select features with strongest overall interactions.