sensiHSIC
allows to conduct global sensitivity analysis (GSA) in many different contexts thanks to several sensitivity measures based on the Hilbert-Schmidt independence criterion (HSIC). The so-called HSIC sensitivity indices depend on the kernels which are affected to the input variables
The input variables
Any kind of mathematical object is supported (provided that a kernel function is available).
Accurate estimation is possible even in presence of very few data (no more than a hundred of input-output samples).
In sensiHSIC
, each input variable
A scalar output (either discrete or continous). If so, one single kernel family is selected among the kernel collection.
A low-dimensional vector output. If so, a kernel is selected for each output variable and the final output kernel is built by tensorization.
A high-dimensional vector output or a functional output. In this case, the output data must be seen as time series observations. Three different methods are proposed.
A preliminary dimension reduction may be performed. In order to achieve this, a principal component analysis (PCA) based on the empirical covariance matrix helps identify the first terms of the Kharunen-Loeve expansion. The final output kernel is then built in the reduced subspace where the functional data are projected.
The dynamic time warping (DTW) algorithm may be combined with a translation-invariant kernel. The resulting DTW-based output kernel is well-adapted to measure similarity between two given time series.
The global alignment kernel (GAK) may be directly applied on the functional data. Unlike the DTW kernel, it was specifically designed to deal with time series and to measure the impact of one input variable on the shape of the output curve.
Many variants of the original HSIC indices are now available in sensiHSIC
.
Normalized HSIC indices (R2-HSIC)
The original HSIC indices defined in Gretton et al. (2005) may be hard to interpret because they do not admit a universal upper bound. A first step to overcome this difficulty was enabled by Da Veiga (2015) with the definition of the R2-HSIC indices. The resulting sensitivity indices can no longer be greater than
Target HSIC indices (T-HSIC)
They were thought by Marrel and Chabridon (2021) to meet the needs of target sensitivity analysis (TSA). The idea is to measure the impact of each input variable
Conditional HSIC indices (C-HSIC)
They were thought by Marrel and Chabridon (2021) to meet the needs of conditional sensitivity analysis (CSA). The idea is to measure the impact of each input variable
HSIC-ANOVA indices
To improve the interpretability of HSIC indices, Da Veiga (2021) came up with an ANOVA-like decomposition that allows to establish a strict separation of main effects and interaction effects in the HSIC paradigm. The first-order and total-order HSIC-ANOVA indices are then defined just as the first-order and total-order Sobol' indices. However, this framework only holds if two major assumptions are verified: the input variables
As most sensitivity measures, HSIC indices allow to rank the input variables testHSIC
.
sensiHSIC(model = NULL, X, target = NULL, cond = NULL,
kernelX = "rbf", paramX = NA,
kernelY = "rbf", paramY = NA,
estimator.type = "V-stat",
nboot = 0, conf = 0.95,
anova = list(obj = "no", is.uniform = TRUE),
sensi = NULL,
save.GM = list(KX = TRUE, KY = TRUE), ...)
# S3 method for sensiHSIC
tell(x, y = NULL, ...)# S3 method for sensiHSIC
print(x, ...)
# S3 method for sensiHSIC
plot(x, ylim = c(0, 1), ...)
sensiHSIC
returns a list of class "sensiHSIC"
. It contains all the input arguments detailed before, except sensi
which is not kept. It must be noted that some of them might have been altered, corrected or completed.
A vector of
A vector of NA
is specified). For each parameter-free kernel, NA
is returned.
A vector of kernelY
is a list of options with method="PCA"
, kernelY
contains additional information resulting from PCA.
If kernelY
initally contained an option named "expl.var"
, kernelY
now also contains an option named "PC"
that provides the associated number of principal components.
If kernelY
initially contained an option named "PC"
, kernelY
now also contains an option named "expl.var"
that provides the associated percentage of output variance that is explained by PCA.
If kernelY
initally contained an option named "position"
that was set to "intern"
or "extern"
, kernelY
now contains an option named "ratios"
that provides the weights used to combine kernels in the reduced subspace given by PCA.
A vector of values with output kernel parameters.
Case 1: kernelY
is a list of
paramY
is a vector of q
values. For each one-parameter kernel, a real number is returned. It is either the original value (if correct), a corrected value or the default value (computed with a rule of thumb if NA
was initially specified). For each parameter-free kernel, NA
is returned.
Case 2: kernelY
is a list of options with method="PCA"
.
paramY
is a vector of PC
values. For this method, let us recall that all kernels belong to the same family which is specified by an option named "fam"
within kernelY
. For each dimension in the reduced subspace, the kernel parameter is computed (with a rule of thumb) from the corresponding principal component. If the kernel in fam
is parameter-free, paramY
is a vector where NA
is repeated PC
times.
Case 3: kernelY
is a list of options with method="DTW"
.
paramY
remains equal to NA
.
Case 4: kernelY
is a list of options with method="GAK"
.
paramY
is a vector of NA
was initially specified).
More importantly, the list of class "sensiHSIC"
contains all expected results (output samples, sensitivity measures and conditioning weights).
The matched call.
A y
is obtained from the X
after computing the model response. If target
is passed to sensiHSIC
, output samples in y
are obtained after applying consecutively model
and the specified weight function.
The estimated HSIC indices.
The estimated R2-HSIC indices (also called normalized HSIC indices).
Only if cond
is passed to sensiHSIC
.
A vector of
Depending on what is specified in anova
, the list of class "sensiHSIC"
may also contain the following objects:
The estimated first-order HSIC-ANOVA indices.
The estimated total-order HSIC-ANOVA indices.
The estimated numerators of total-order HSIC-ANOVA indices.
The estimated common denominator of HSIC-ANOVA indices.
A function, or a statistical model with a predict
method. It defines the input-output model that needs to be studied.
A
If the user is only wanting to estimate HSIC indices or R2-HSIC indices, the input variables can be correlated.
If the user is also wanting to estimate HSIC-ANOVA indices, the input variables have to be mutually independent.
A list of options to perform TSA. At least, target
must contain an option named "c"
. For other options, there exist default assignments.
type
is a string specifying the weight function. Available choices include "indicTh"
, "zeroTh"
, "logistic"
and "exp1side"
. Default value is "exp1side"
.
"indicTh"
and "zeroTh"
only depend on a threshold parameter.
"logistic"
and "exp1side"
depend on both a threshold parameter and a smoothness parameter.
c
is a scalar value specifying the threshold parameter.
upper
is a boolean indicating whether the target region is located above (TRUE
) or below (FALSE
) the threshold parameter c
. Only relevant when type
is "indicTh"
, "zeroTh"
or "exp1side"
. Default value is TRUE
.
param
is a scalar value specifying the smoothness parameter. Only relevant when type
is "logistic"
or "exp1side"
. Default value is 1
.
A list of options to perform CSA. At least, cond
must contain an option named "c"
. For other options, there exist default assignments.
type
is a string specifying the weight function. Available choices include "indicTh"
, "zeroTh"
, "logistic"
and "exp1side"
. Default value is "exp1side"
.
"indicTh"
and "zeroTh"
only depend on a threshold parameter.
"logistic"
and "exp1side"
depend on both a threshold parameter and a smoothness parameter.
c
is a scalar value specifying the threshold parameter.
upper
is a boolean indicating whether the conditioning region is located above (TRUE
) or below (FALSE
) the threshold parameter c
. Only relevant when type
is "indicTh"
, "zeroTh"
or "exp1side"
. Default value is TRUE
.
param
is a scalar value specifying the smoothness parameter. Only relevant if type
is "logistic"
or "exp1side"
. Default value is 1
.
A string or a vector of
If only one string is provided, the associated kernel is affected to all inputs.
For dimension-wise kernel selection, a vector of
For each input variable, available choices include "categ"
(categorical kernel), "dcov"
(covariance kernel of the fractional Brownian motion), "invmultiquad"
(inverse multiquadratic kernel), "laplace"
(exponential kernel), "linear"
(dot-product kernel), "matern3"
(Matern "matern5"
(Matern "raquad"
(rationale quadratic kernel), "rbf"
(Gaussian kernel), "sobolev1"
(Sobolev kernel with smoothness parameter "sobolev2"
(Sobolev kernel with smoothness parameter
In addition, let us assume that all input variables are uniformly distributed on "laplace"
, "matern3"
, "matern5"
and "rbf"
can be easily transformed into ANOVA kernels. The resulting kernels are respectively called "laplace_anova"
, "matern3_anova"
, "matern5_anova"
and "rbf_anova"
.
One-parameter kernels: "categ"
, "dcov"
, "invmultiquad"
, "laplace"
, "laplace_anova"
, "matern3"
, "matern3_anova"
, "matern5"
, "matern5_anova"
, "raquad"
, "rbf"
and "rbf_anova"
.
Parameter-free kernels: "linear"
, "sobolev1"
and "sobolev2"
.
A scalar value or a vector of
If paramX=NA
, input kernel parameters are computed automatically with rules of thumb.
If paramX
is a scalar value, it is affected to all input kernels.
For dimension-wise kernel parametrization, a vector of kernelX
combines one-parameter kernels and parameter-free kernels, NA
must be specified for parameter-free kernels.
A string, a vector of
To deal with a scalar output or a low-dimensional vector output, it is advised to select one kernel per output dimension and to tensorize all selected kernels. In this case, kernelY
must be a string or a vector of
If only one string is provided, the associated kernel is repeated
For dimension-wise kernel selection, a vector of
Have a look at kernelX
for an exhaustive list of available kernels.
To deal with a high-dimensional vector output or a functional output, it is advised to reduce dimension or to use a dedicated kernel. In this case, kernelY
must be specified as a list of options. At least, kernelY
must contain an option named "method"
. For other options, there exist default assignments.
method
is a string indicating the strategy used to construct the output kernel. Available choices include "PCA"
(dimension reduction through principal component analysis), "DTW"
(dynamic type warping) and "GAK"
(global alignment kernel).
If method="PCA"
, the following options may also be specified:
data.centering
is a boolean indicating whether the input samples must be centered before performing the preliminary PCA. Default value is TRUE
.
data.scaling
is a boolean indicating whether the input samples must be scaled before performing the preliminary PCA. Default value is TRUE
.
fam
is a string specifying the input kernel which is applied on principal components. Available choices only include "dcov"
, "invmultiquad"
, "laplace"
, "linear"
, "matern3"
, "matern5"
, "raquad"
and "rbf"
. Default value is "rbf"
.
expl.var
is a scalar value (between 0.95
.
PC
is the expected number of principal components in PCA. Default value is NA
.
combi
is a string indicating how the final output kernel is built in the reduced subspace. Available options include "sum"
or "prod"
. The chosen kernel in fam
is applied on all principal components before summation (if "sum"
) or tensorization (if "prod"
).
position
is a string indicating whether weights have to be involved in the construction of the final output kernel in the reduced subspace. Available choices include "nowhere"
(no weights), "intern"
(weights applied on principal components) or "extern"
(weights applied on kernels). Default value is "intern"
.
Remark: expl.var
and PC
are conflicting options. Only one of both needs to be specified and the other one must be set to NA
. If both are specified, expl.var
is prioritized. If both are set to NA
, expl.var
is then set to its default value.
If method="DTW"
, the following option may also be specified:
fam
is a string specifying the translation-invariant kernel which is combined with DTW. Available choices only include "invmultiquad"
, "laplace"
, "matern3"
, "matern5"
, "raquad"
and "rbf"
. Default value is "rbf"
.
If method="GAK"
, there is no other option to specify.
A scalar value or a vector of values with output kernel parameters.
If paramY=NA
, output kernel parameters are computed automatically with rules of thumb.
In other cases, paramY
must be specified in agreement with kernelY
.
Case 1: kernelY
is a string or a vector of
paramY
must be a scalar value or a vector of
If paramY
is a scalar value, it is affected to all output kernels.
For dimension-wise kernel parametrization, a vector of kernelY
combines one parameter kernels and parameter-free kernels, NA
must be specified for parameter-free kernels.
Case 2: kernelY
is a list of options with method="PCA"
.
paramY
must be set to NA
because the parameters involved in the final output kernel are computed automatically. Their optimal tuning depends on the reduced subspace given by PCA.
Case 3: kernelY
is a list of options with method="DTW"
.
paramY
must be set to NA
.
Case 4: kernelY
is a list of options with method="GAK"
.
paramY
must be a vector of NA
. The two parameters correspond to the arguments sigma
and window.size
in the function gak
from the package dtwclust
. However, automatical computation (specified by paramY=NA
) is strongly advised for this kind of output kernel.
A string specifying the kind of estimator used for HSIC indices. Available choices include "U-stat"
(U-stastics) and "V-stat"
(V-statistics). U-statistics are unbiased estimators. V-statistics are biased estimators but they become unbiased asymptotically. In the specific case of HSIC indices, V-statistics are non-negative estimators and they offer more flexibility for further test procedures (see testHSIC
). Both kinds of estimators can be computed with complexity
Number of bootstrap replicates.
A scalar value (between
A list of parameters to achieve an ANOVA-like decomposition based on HSIC indices. At least, anova
must contain an option named "obj"
. For other options, there exist default assignments.
obj
is a string specifying which kinds of HSIC-ANOVA indices are expected. Available choices include "no"
(anova
is disabled), "FO"
(first-order only), "TO"
(total-order only) and "both"
(first-order and total-order).
is.uniform
is a boolean indicating whether the samples stored in X
come from random variables that are uniformly distributed on "laplace_anova"
, "matern3_anova"
, "matern5_anova"
, "rbf_anova"
, "sobolev1"
and "sobole2"
verify this constraint (provided that all input variables are uniformly distributed on
If is.uniform=TRUE
, it is checked that each input data stored in
If is.uniform=FALSE
, non-parametric rescaling (based on empirical distribution functions) is operated.
An object of class "sensiHSIC"
resulting from a prior call to the hereby function. If an object of class "sensiHSIC"
is indeed provided, the following happens:
If sensi
contains an object named "KX"
, it is extracted from sensi
and the input Gram matrices (required to estimate HSIC indices) are not built from X
, kernelX
and paramX
.
If sensi
contains an object named "KY"
, it is extracted from sensi
and the output Gram matrix (required to estimate HSIC indices) is not built from model
, kernelY
and paramY
.
A list of parameters indicating whether Gram matrices have to be saved. The list save.GM
must contain options named "KX"
and "KY"
.
KX
is a boolean indicating whether the input Gram matrices have to be saved.
KY
is a boolean indicating whether the output Gram matrix has to be saved.
An object of class "sensiHSIC"
storing the state of the sensitivity study (parameters, data, estimates).
A
A vector of two values specifying the
Any other arguments for model
which are passed unchanged each time model
is called.
Sebastien Da Veiga, Amandine Marrel, Anouar Meynaoui, Reda El Amri and Gabriel Sarazin.
Let
For many global sensitivity measures, the influence of
The HSIC-based sensitivity measure can be understood in this way since the index
In practice, it may be difficult to understand how
Kernels must be chosen very carefully. There exists a wide variety of kernels but only a few f them meet the needs of GSA. As
The Gaussian kernel, the Laplace kernel, the Matern
The transformed versions of the four abovementioned kernels (all defined on
All Sobolev kernels (defined on
The categorical kernel (defined on any discrete probability space) is characteristic.
Lemma 1 in Gretton et al. (2005) provides a third way of defining
The user must always keep in mind the key steps leading to the estimation of
Input samples are simulated and the corresponding output samples are computed with the numerical model.
An input kernel
In case of target sensitivity analysis: output samples are transformed by means of a weight function
The input and output Gram matrices are constructed.
In case of conditional sensitivity analysis: conditioning weights are computed by means of a weight function
The final estimate is computed. It depends on the selected estimator type (either a U-statistic or a V-statistic).
All what follows is written for a scalar output
Let sensiHSIC
.
To deal with continuous probability distributions on
The covariance kernel of the fractional Browian motion ("dcov"
), the inverse multiquadratic kernel ("invmultiquad"
), the exponential kernel ("laplace"
), the dot-product kernel ("linear"
), the Matern "matern3"
), the Matern "matern5"
), the rationale quadratic kernel ("raquad"
) and the Gaussian kernel ("rbf"
).
To deal with continuous probability distributions on
Any of the abovementioned kernel (restricted to
The transformed exponential kernel ("laplace_anova"
), the transformed Matern "matern3_anova"
), the transformed Matern "matern5_anova"
), the transformed Gaussian kernel ("rbf_anova"
), the Sobolev kernel with smoothness parameter "sobolev1"
) and the Sobolev kernel with smoothness parameter "sobolev2"
).
To deal with any discrete probability distribution, the categorical kernel ("categ"
) must be used.
Two kinds of kernels must be distinguished:
Parameter-free kernels: the dot-product kernel ("linear"
), the Sobolev kernel with smoothness parameter "sobolev1"
) and the Sobolev kernel with smoothness parameter "sobolev2"
).
One-parameter kernels: the categorical kernel ("categ"
), the covariance kernel of the fractional Brownian motion kernel ("dcov"
), the inverse multiquadratic kernel ("invmultiquad"
), the exponential kernel ("laplace"
), the transformed exponential kernel ("laplace_anova"
), the Matern "matern3"
), the transformed Matern "matern3_anova"
), the Matern "matern5"
), the transformed Matern "matern5_anova"
), the rationale quadratic kernel ("raquad"
), the Gaussian kernel ("rbf"
) and the transformed Gaussian kernel ("rbf_anova"
).
A major issue related to one-parameter kernels is how to set the parameter. It mainly depends on the role played by the parameter in the kernel expression.
For translation-invariant kernels and their ANOVA variants (that is all one-parameter kernels except "categ"
and "dcov"
), the parameter may be interpreted as a correlation length (or a scale parameter). The rule of thumb is to compute the empirical standard deviation of the provided samples.
For the covariance kernel of the fractional Brownian motion ("dcov"
), the parameter is an exponent. Default value is
For the categorical kernel ("categ"
), the parameter has no physical sense. It is just a kind of binary encoding.
Let us assume that the output vector
A kernel
The final output kernel is the tensor product of all output kernels.
The final output Gram matrix is the Hadamard product of all output Gram matrices.
Once the final output Gram matrix is built, HSIC indices can be estimated, just as in the case of a scalar output.
In sensiHSIC
, three different methods are proposed in order to compute HSIC-based sensitivity indices in presence of functional outputs.
Dimension reduction
This approach was initially proposed by Da Veiga (2015). The key idea is to approximate the random functional output by the first terms of its Kharunen-Loeve expansion. This can be achived with a principal component analysis (PCA) that is carried out on the empirical covariance matrix.
The eigenvectors (or principal directions) allow to approximate the (deterministic) functional terms involved in the Kharunen-Loeve decomposition.
The eigenvalues allow to determine how many principal directions are sufficient in order to accurately represent the random function by means of its truncated Kharunen-Loeve expansion. The key idea behind dimension reduction is to keep as few principal directions as possible while preserving a prescribed level of explained variance.
The principal components are the coordinates of the functional output in the low-dimensional subspace resulting from PCA. There are computed for all output samples (time series observations). See Le Maitre and Knio (2010) for more detailed explanations.
The last step consists in constructing a kernel in the reduced subspace. One single kernel family is selected and affected to all principal directions. Moreover, all kernel parameters are computed automatically (with appropriate rules of thumb). Then, several strategies may be considered.
The initial method described in Da Veiga (2015) is based on a direct tensorization. One can also decide to sum kernels.
This approach was improved by El Amri and Marrel (2021). For each principal direction, a weight coefficient (equal the ratio between the eigenvalue and the sum of all selected eigenvalues) is computed.
The principal components are multiplied by their respective weight coefficients before summing kernels or tensorizing kernels.
The kernels can also be directly applied on the principal components before being linearly combined according to the weight coefficients.
In sensiHSIC, all these strategies correspond to the following specifications in kernelY
:
Direct tensorization:
kernelY=list(method="PCA", combi="prod", position="nowhere")
Direct sum:
kernelY=list(method="PCA", combi="sum", position="nowhere")
Rescaled tensorization:
kernelY=list(method="PCA", combi="prod", position="intern")
Rescaled sum:
kernelY=list(method="PCA", combi="sum", position="intern")
Weighted linear combination:
kernelY=list(method="PCA", combi="sum", position="extern")
Dynamic Time Warping (DTW)
The DTW algorithm developed by Sakoe and Chiba (1978) can be combined with a translation-invariant kernel in order to create a kernel function for times series. The resulting DTW-based output kernel is well-adapted to measure similarity between two given time series.
Suitable translation-invariant kernels include the inverse multiquadratic kernel ("invmultiquad"
), the exponential kernel ("laplace"
), the Matern "matern3"
), the Matern "matern5"
), the rationale quadratic kernel ("raquad"
) and the Gaussian kernel ("rbf"
).
The user is warned against the fact that DTW-based kernels are not positive definite functions. As a consequence, many theoretical properties do not hold anymore for HSIC indices.
For faster computations, sensiHSIC
is using the function dtw_dismat
from the package incDTW
.
Global Alignment Kernel (GAK)
Unlike DTW-based kernels, the GAK is a positive definite function. This time-series kernel was originally introduced in Cuturi et al. (2007) and further investigated in Cuturi (2011). It was used to compute HSIC indices on a simplified compartmental epidemiological model in Da Veiga (2021).
For faster computations, sensiHSIC
is using the function gak
from the package dtwclust
.
In sensiHSIC
, two GAK-related parameters may be tuned by the user with paramY
. They exactly correspond to the arguments sigma
and window.size
in the function gak
.
No doubt interpretability is the major drawback of HSIC indices. This shortcoming led Da Veiga (2021) to introduce a normalized version of
This normalized sensitivity measure is inspired from the distance correlation measure proposed by Szekely et al. (2007) and the resulting sensitivity indices are easier to interpret since they all fall in the interval
T-HSIC indices were designed by Marrel and Chabridon (2021) for TSA. They are only defined for a scalar output. Vector and functional outputs are not supported. The main idea of TSA is to measure the influence of each input variable
How to measure the impact of
To answer this question, one may take
This can be specified in sensiHSIC
with target=list(c=T, type="zeroTh", upper=TRUE)
.
How to measure the influence of
To answer this question, one may take
This can be specified in sensiHSIC
with target=list(c=T, type="indicTh", upper=FALSE)
.
In Marrel and Chabridon (2021), the two situations described above are referred to as "hard thresholding". To avoid using discontinuous weight functions, "smooth thresholding" may be used instead.
Spagnol et al. (2019): logistic transformation on both sides of the threshold
Marrel and Chabridon (2021): exponential transformation above or below the threshold
These two smooth relaxation functions depend on a tuning parameter that helps control smoothness. For further details, the user is invited to consult the documentation of the function weightTSA
.
Remarks:
When type="indicTh"
(indicator-thesholding), kernelY
must be the categorical kernel.
In the spirit of R2-HSIC indices, T-HSIC indices can be normalized. The associated normalizing constant is equal to the square root of
T-HSIC indices can be very naturally combined with the HSIC-ANOVA decomposition proposed by Da Veiga (2021). As a consequence, the arguments target
and anova
in sensiHSIC
can be enabled simultaneously. Compared with basic HSIC indices, there are three main differences: the input variables must be mutually independent, ANOVA kernels must be used for all input variables and the output of interest is
T-HSIC indices can be very naturally combined with the tests of independence proposed in testHSIC
. In this context, the null hypothesis is
C-HSIC indices were designed by Marrel and Chabridon (2021) for CSA. They are only defined for a scalar output. Vector and functional outputs are not supported. The idea is to measure the impact of each input variable
Let us imagine that the event
How to measure the influence of
To answer this question, one may take
This can be specified in sensiHSIC
with cond=list(c=T, type="indicTh", upper=TRUE)
.
The three other weight functions proposed for TSA (namely "zeroTh"
, "logistic"
and "exp1side"
) can also be used but the role they play is less intuitive to understand. See Marrel and Chabridon (2021) for better explanations.
Remarks:
Unlike what is pointed out for TSA, when type="thresholding"
, the output of interest
In the spirit of R2-HSIC indices, C-HSIC indices can be normalized. The associated normalizing constant is equal to the square root of
Only V-statistics are supported to estimate C-HSIC indices. The reason is because the normalized version of C-HSIC indices cannot always be estimated with U-statistics. In particular, the estimates of
C-HSIC indices cannot be combined with the HSIC-ANOVA decomposition proposed in Da Veiga (2021). In fact, the conditioning operation is feared to introduce statistical dependence among input variables, which forbids using HSIC-ANOVA indices. As a consequence, the arguments cond
and anova
in sensiHSIC
cannot be enabled simultaneously.
C-HSIC indices can harly be combined with the tests of inpendence proposed in testHSIC
. This is only possible if type="indicTh"
. In this context, the null hypothesis is cond
occurs".
In comparison with HSIC indices, R2-HSIC indices are easier to interpret. However, in terms of interpretability, Sobol' indices remain much more convenient since they can be understood as shares of the total output variance. Such an interpretation is made possible by the Hoeffding decomposition, also known as ANOVA decomposition.
It was proved in Da Veiga (2021) that an ANOVA-like decomposition can be achived for HSIC indices under certain conditions:
The input variables must be mutually independent (which was not required to compute all other kinds of HSIC indices).
ANOVA kernels must be assigned to all input variables.
This ANOVA setup allows to establish a strict separation between main effects and interaction effects in the HSIC sense. The first-order and total-order HSIC-ANOVA indices are then defined in the same fashion than first-order and total-order Sobol' indices. It is worth noting that the HSIC-ANOVA normalizing constant is equal to
For a given probability measure
Moreover, several strategies to construct orthogonal kernels from non-orthogonal kernels are recalled in Da Veiga (2021). One of them consists in translating the feature map so that the resulting kernel becomes centered at the prescribed probability measure
In sensiHSIC
, ANOVA kernels are only available for the uniform probability measure on "sobolev1"
), the Sobolev kernel with parameter "sobolev2"
), the transformed Gaussian kernel ("rbf_anova"
), the transformed exponential kernel ("laplace_anova"
), the transformed Matern "matern3_anova"
) and the transformed Matern "matern5_anova"
).
As explained above, the HSIC-ANOVA indices can only be computed if all input variables are uniformly distributed on model
is specified. In sensiHSIC
, if check=TRUE
is selected in anova
, it is checked that all input samples lie in
HSIC-ANOVA indices can be used for TSA. The only difference with GSA is the use of a weight function
Borgonovo, E. and Plischke, E. (2016), Sensitivity analysis: a review of recent advances, European Journal of Operational Research, 248(3), 869-887.
Cuturi, M., Vert, J. P., Birkenes, O. and Matsui, T. (2007), A kernel for time series based on global alignments, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07 (Vol. 2, pp. II-413), IEEE.
Cuturi, M. (2011), Fast global alignment kernels, Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp. 929-936).
Da Veiga, S. (2015), Global sensitivity analysis with dependence measures, Journal of Statistical Computation and Simulation, 85(7), 1283-1305.
Da Veiga, S. (2021). Kernel-based ANOVA decomposition and Shapley effects: application to global sensitivity analysis, arXiv preprint arXiv:2101.05487.
El Amri, M. R. and Marrel, A. (2021), More powerful HSIC-based independence tests, extension to space-filling designs and functional data. https:/cea.hal.science/cea-03406956/
Fukumizu, K., Bach, F. R. and Jordan, M. I. (2004), Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces, Journal of Machine Learning Research, 5(Jan), 73-99.
Ginsbourger, D., Roustant, O., Schuhmacher, D., Durrande, N. and Lenz, N. (2016), On ANOVA decompositions of kernels and Gaussian random field paths, Monte Carlo and Quasi-Monte Carlo Methods (pp. 315-330), Springer, Cham.
Gretton, A., Bousquet, O., Smola, A., and Scholkopf, B. (2005), Measuring statistical dependence with Hilbert-Schmidt norms, International Conference on Algorithmic Learning Theory (pp. 63-77), Springer, Berlin, Heidelberg.
Gretton, A., Borgwardt, K., Rasch, M., Scholkopf, B. and Smola, A. (2006), A kernel method for the two-sample-problem, Advances in Neural Information Processing Systems, 19.
Le Maitre, O. and Knio, O. M. (2010), Spectral methods for uncertainty quantification with applications to computational fluid dynamics, Springer Science & Business Media.
Marrel, A. and Chabridon, V. (2021), Statistical developments for target and conditional sensitivity analysis: application on safety studies for nuclear reactor, Reliability Engineering & System Safety, 214, 107711.
Sakoe, H. and Chiba, S. (1978), Dynamic programming algorithm optimization for spoken word recognition, IEEE International Conference on Acoustics, Speech and Signal, 26(1), 43-49.
Spagnol, A., Riche, R. L. and Veiga, S. D. (2019), Global sensitivity analysis for optimization with variable selection, SIAM/ASA Journal on Uncertainty Quantification, 7(2), 417-443.
Sriperumbudur, B., Fukumizu, K. and Lanckriet, G. (2010), On the relation between universality, characteristic kernels and RKHS embedding of measures, Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (pp. 773-780). JMLR Workshop and Conference Proceedings.
Szekely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007), Measuring and testing dependence by correlation of distances, The Anals of Statistics, 35(6), 2769-2794.
Wahba, G., Wang, Y., Gu, C., Klein, R. and Klein, B. (1995), Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy: the 1994 Neyman Memorial Lecture, The Annals of Statistics, 23(6), 1865-1895.
testHSIC, weightTSA
# \donttest{
############################
### HSIC indices for GSA ###
############################
# Test case 1: the Friedman function
# --> 5 input variables
### GSA with a given model ###
n <- 800
p <- 5
X <- matrix(runif(n*p), n, p)
kernelX <- c("rbf", "rbf", "laplace", "laplace", "sobolev1")
paramX <- c(0.2, 0.3, 0.4, NA, NA)
# kernel for X1: Gaussian kernel with given parameter 0.2
# kernel for X2: Gaussian kernel with given parameter 0.3
# kernel for X3: exponential kernel with given parameter 0.4
# kernel for X4: exponential kernel with automatic computation of the parameter
# kernel for X5: Sobolev kernel (r=1) with no parameter
kernelY <- "raquad"
paramY <- NA
sensi <- sensiHSIC(model=friedman.fun, X,
kernelX=kernelX, paramX=paramX,
kernelY=kernelY, paramY=paramY)
print(sensi)
plot(sensi)
title("GSA for the Friedman function")
### GSA with given data ###
Y <- friedman.fun(X)
sensi <- sensiHSIC(model=NULL, X,
kernelX=kernelX, paramX=paramX,
kernelY=kernelY, paramY=paramY)
tell(sensi, y=Y)
print(sensi)
### GSA from a prior object of class "sensiHSIC" ###
new.sensi <- sensiHSIC(model=friedman.fun, X,
kernelX=kernelX, paramX=paramX,
kernelY=kernelY, paramY=paramY,
estimator.type="U-stat",
sensi=sensi,
save.GM=list(KX=FALSE, KY=FALSE))
print(new.sensi)
# U-statistics are computed without rebuilding all Gram matrices.
# Those Gram matrices are not saved a second time.
##################################
### HSIC-ANOVA indices for GSA ###
##################################
# Test case 2: the Matyas function with Gaussian input variables
# --> 3 input variables (including 1 dummy variable)
n <- 10^3
p <- 2
X <- matrix(rnorm(n*p), n, p)
# The Sobolev kernel (with r=1) is used to achieve the HSIC-ANOVA decomposition.
# Both first-order and total-order HSIC-ANOVA indices are expected.
### AUTOMATIC RESCALING ###
kernelX <- "sobolev1"
anova <- list(obj="both", is.uniform=FALSE)
sensi.A <- sensiHSIC(model=matyas.fun, X, kernelX=kernelX, anova=anova)
print(sensi.A)
plot(sensi.A)
title("GSA for the Matyas function")
### PROBLEM REFORMULATION ###
U <- matrix(runif(n*p), n, p)
new.matyas.fun <- function(U){ matyas.fun(qnorm(U)) }
kernelX <- "sobolev1"
anova <- list(obj="both", is.uniform=TRUE)
sensi.B <- sensiHSIC(model=new.matyas.fun, U, kernelX=kernelX, anova=anova)
print(sensi.B)
####################################
### T-HSIC indices for target SA ###
####################################
# Test case 3: the Sobol function
# --> 8 input variables
n <- 10^3
p <- 8
X <- matrix(runif(n*p), n, p)
kernelY <- "categ"
target <- list(c=0.4, type="indicTh")
sensi <- sensiHSIC(model=sobol.fun, X, kernelY=kernelY, target=target)
print(sensi)
plot(sensi)
title("TSA for the Sobol function")
#########################################
### C-HSIC indices for conditional SA ###
#########################################
# Test case 3: the Sobol function
# --> 8 input variables
n <- 10^3
p <- 8
X <- matrix(runif(n*p), n, p)
cond <- list(c=0.2, type="exp1side", upper=FALSE)
sensi <- sensiHSIC(model=sobol.fun, X, cond=cond)
print(sensi)
plot(sensi)
title("CSA for the Sobol function")
##########################################
### How to deal with discrete outputs? ###
##########################################
# Test case 4: classification of the Ishigami output
# --> 3 input variables
# --> 3 categories
classif <- function(X){
Ytemp <- ishigami.fun(X)
Y <- rep(NA, n)
Y[Ytemp<0] <- 0
Y[Ytemp>=0 & Ytemp<10] <- 1
Y[Ytemp>=10] <- 2
return(Y)
}
###
n <- 10^3
p <- 3
X <- matrix(runif(n*p, -pi, pi), n, p)
kernelY <- "categ"
paramY <- 0
sensi <- sensiHSIC(model=classif, X, kernelY=kernelY, paramY=paramY)
print(sensi)
plot(sensi)
title("GSA for the classified Ishigami function")
############################################
### How to deal with functional outputs? ###
############################################
# Test case 5: the arctangent temporal function
# --> 3 input variables (including 1 dummy variable)
n <- 500
p <- 3
X <- matrix(runif(n*p,-7,7), n, p)
### with a preliminary dimension reduction by PCA ###
kernelY <- list(method="PCA",
data.centering=TRUE, data.scaling=TRUE,
fam="rbf", expl.var=0.95, combi="sum", position="extern")
sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY)
print(sensi)
plot(sensi)
title("PCA-based GSA for the arctangent temporal function")
### with a kernel based on dynamic time warping ###
kernelY <- list(method="DTW", fam="rbf")
sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY)
print(sensi)
plot(sensi)
title("DTW-based GSA for the arctangent temporal function")
# \donttest{
### with the global alignment kernel ###
kernelY <- list(method="GAK")
sensi <- sensiHSIC(model=atantemp.fun, X, kernelY=kernelY)
print(sensi)
plot(sensi)
title("GAK-based GSA for the arctangent temporal function")
# }
# }
Run the code above in your browser using DataLab