Learn R Programming

satdad (version 1.1)

tsicEmp: Empirical tail superset importance coefficients.

Description

Computes on a sample the tail superset importance coefficients (tsic) associated with threshold k. The value may be renormalized by the empirical global variance (Sobol version) and/or by its theoretical upper bound.

Usage

tsicEmp(sample, ind = 2, k, sobol = FALSE, norm = FALSE)

Value

The function returns a list of two elements:

  • subsets A list of subsets from \(\{1,...,d\}\).

    When ind is given as an integer, subsets is the list of subsets from \(\{1,...,d\}\) with cardinality ind. When ind is the list, it corresponds to subsets.

    When ind = "with.singletons" subsets is the list of all non empty subsets in \(\{1,...,d\}\).

    When ind = "all" subsets is the list of all subsets in \(\{1,...,d\}\) with cardinality larger or equal to 2.

  • tsic A vector of empirical tail superset importance coefficients associated with the list subsets. When norm = TRUE, then tsic are normalized in the sense that the original values are divided by corresponding upper bounds.

Arguments

sample

A (n times d) matrix.

ind

A character string among "with.singletons" and "all" (without singletons), or an integer in \(\{2,...,d\}\) or a list of subsets from \(\{1,...,d\}\). The default is ind = 2, all pairwise coefficients are computed.

k

An integer smaller or equal to n.

sobol

A boolean. `FALSE` (the default). If `TRUE`: the index is normalized by the empirical global variance.

norm

A boolean. `FALSE` (the default). If `TRUE`: the index is normalized by its theoretical upper bound.

Author

Cécile Mercadier (mercadier@math.univ-lyon1.fr)

Details

The theoretical functional decomposition of the variance of the stdf \(\ell\) consists in writing \(D(\ell) = \sum_{I \subseteq \{1,...,d\}} D_I(\ell) \) where \(D_I(\ell)\) measures the variance of \(\ell_I(U_I)\) the term associated with subset \(I\) in the Hoeffding-Sobol decomposition of \(\ell\) ; note that \(U_I\) represents a random vector with independent standard uniform entries.

Fixing a subset of components \(I\), the theoretical tail superset importance coefficient is defined by \(\Upsilon_I(\ell)=\sum_{J \supseteq I} D_J(\ell)\). A theoretical upper bound for tsic \(\Upsilon_I(\ell)\) is given by Theorem 2 in Mercadier and Ressel (2021) which states that \(\Upsilon_I(\ell)\leq 2(|I|!)^2/((2|I|+2)!)\).

Here, the function tsicEmp evaluates, on a \(n\)-sample and threshold \(k\), the empirical tail superset importance coefficient \(\hat{\Upsilon}_{I,k,n}\) the empirical counterpart of \(\Upsilon_I(\ell)\).

Under the option sobol = TRUE, the function tsicEmp returns \(\dfrac{\hat{\Upsilon}_{I,k,n}}{\hat{D}_{k,n}}\) the empirical counterpart of \(\dfrac{\Upsilon_I(\ell)}{D_I(\ell)}\).

Under the option norm = TRUE, the quantities are multiplied by \(\dfrac{(2|I|+2)!}{2(|I|!)^2}\).

Proposition 1 and Theorem 2 of Mercadier and Roustant (2019) provide several rank-based expressions

\(\hat{\Upsilon}_{I,k,n}=\frac{1}{k^2}\sum_{s=1}^n\sum_{s^\prime=1}^n \prod_{t\in I}(\min(\overline{R}^{(t)}_s,\overline{R}^{(t)}_{s^\prime})-\overline{R}^{(t)}_{s}\overline{R}^{(t)}_{s^\prime}) \prod_{t\notin I} \min(\overline{R}^{(t)}_s,\overline{R}^{(t)}_{s^\prime})\)

\(\hat{D}_{k,n}=\frac{1}{k^2}\sum_{s=1}^n\sum_{s^\prime=1}^n \prod_{t\in I}\min(\overline{R}^{(t)}_s,\overline{R}^{(t)}_{s^\prime})- \prod_{t\in I}\overline{R}^{(t)}_{s}\overline{R}^{(t)}_{s^\prime}\)

where

  • \(k\) is the threshold parameter,

  • \(n\) is the sample size,

  • \(X_1,...,X_n\) describes the sample, each \(X_s\) is a d-dimensional vector \(X_s^{(t)}\) for \(t=1,...,d\),

  • \(R^{(t)}_s\) denotes the rank of \(X^{(t)}_s\) among \(X^{(t)}_1, ..., X^{(t)}_n\),

  • and \(\overline{R}^{(t)}_s = \min((n- R^{(t)}_s+1)/k,1)\).

References

Mercadier, C. and Ressel, P. (2021) Hoeffding–Sobol decomposition of homogeneous co-survival functions: from Choquet representation to extreme value theory application. Dependence Modeling, 9(1), 179--198.

Mercadier, C. and Roustant, O. (2019) The tail dependograph. Extremes, 22, 343--372.

See Also

graphsEmp, ellEmp

Examples

Run this code

## Fix a 6-dimensional asymmetric tail dependence structure
ds <- gen.ds(d = 6, sub = list(1:4,5:6))

## Plot the  tail dependograph
graphs(ds)

## Generate a 1000-sample of Archimax Mevlog random vectors
## associated with ds and underlying distribution exp
sample <- rArchimaxMevlog(n = 1000, ds = ds, dist = "exp", dist.param = 1.3)

## Compute tsic values associated with subsets
## of cardinality 2 or more \code{ind = "all"}
res <- tsicEmp(sample = sample, ind = "all", k = 100, sobol = TRUE, norm = TRUE)

## Select the significative tsic
indices_nonzero <- which(res$tsic %in% boxplot.stats(res$tsic)$out == TRUE)

## Subsets associated with significative tsic reflecting the tail support
as.character(res$subsets[indices_nonzero])

## Pairwise tsic are obtained by
res_pairs <- tsicEmp(sample = sample, ind = 2, k = 100, sobol = TRUE, norm = TRUE)

## and plotted in the tail dependograph
graphsEmp(sample, k = 100)

Run the code above in your browser using DataLab