The theoretical functional decomposition of the variance of the stdf \(\ell\) consists in writing \(D(\ell) = \sum_{I \subseteq \{1,...,d\}} D_I(\ell) \) where \(D_I(\ell)\) measures the variance of \(\ell_I(U_I)\) the term associated with subset \(I\) in the Hoeffding-Sobol decomposition of \(\ell\)
; note that \(U_I\) represents a random vector with independent standard uniform entries.
Fixing a subset of components \(I\), the theoretical tail superset importance coefficient is defined by \(\Upsilon_I(\ell)=\sum_{J \supseteq I} D_J(\ell)\).
A theoretical upper bound for tsic \(\Upsilon_I(\ell)\) is given by Theorem 2 in Mercadier and Ressel (2021)
which states that \(\Upsilon_I(\ell)\leq 2(|I|!)^2/((2|I|+2)!)\).
Here, the function tsicEmp
evaluates, on a \(n\)-sample and threshold \(k\), the empirical tail superset importance coefficient \(\hat{\Upsilon}_{I,k,n}\) the empirical counterpart of \(\Upsilon_I(\ell)\).
Under the option sobol = TRUE
, the function tsicEmp
returns \(\dfrac{\hat{\Upsilon}_{I,k,n}}{\hat{D}_{k,n}}\) the empirical counterpart of \(\dfrac{\Upsilon_I(\ell)}{D_I(\ell)}\).
Under the option norm = TRUE
, the quantities are multiplied by \(\dfrac{(2|I|+2)!}{2(|I|!)^2}\).
Proposition 1 and Theorem 2 of Mercadier and Roustant (2019) provide several rank-based expressions
\(\hat{\Upsilon}_{I,k,n}=\frac{1}{k^2}\sum_{s=1}^n\sum_{s^\prime=1}^n \prod_{t\in I}(\min(\overline{R}^{(t)}_s,\overline{R}^{(t)}_{s^\prime})-\overline{R}^{(t)}_{s}\overline{R}^{(t)}_{s^\prime}) \prod_{t\notin I} \min(\overline{R}^{(t)}_s,\overline{R}^{(t)}_{s^\prime})\)
\(\hat{D}_{k,n}=\frac{1}{k^2}\sum_{s=1}^n\sum_{s^\prime=1}^n \prod_{t\in I}\min(\overline{R}^{(t)}_s,\overline{R}^{(t)}_{s^\prime})- \prod_{t\in I}\overline{R}^{(t)}_{s}\overline{R}^{(t)}_{s^\prime}\)
where
\(k\) is the threshold parameter,
\(n\) is the sample size,
\(X_1,...,X_n\) describes the sample
, each \(X_s\) is a d-dimensional vector \(X_s^{(t)}\) for \(t=1,...,d\),
\(R^{(t)}_s\) denotes the rank of \(X^{(t)}_s\) among \(X^{(t)}_1, ..., X^{(t)}_n\),
and \(\overline{R}^{(t)}_s = \min((n- R^{(t)}_s+1)/k,1)\).