sensitivity (version 1.16.2)

sensiHSIC: Sensitivity Indices based on Hilbert-Schmidt Independence Criterion (HSIC)

Description

sensiHSIC conducts a sensitivity analysis where the impact of an input variable is defined in terms of the distance between the input/output joint probability distribution and the product of their marginals when they are embedded in a Reproducing Kernel Hilbert Space (RKHS). This distance corresponds to the Hilbert-Schmidt Independence Criterion (HSIC) proposed by Gretton et al. (2005) and serves as a dependence measure between random variables, see Da Veiga (2015) for an illustration in the context of sensitivity analysis. The use of universal kernels ensures equivalence between HSIC nullity and independence of X and Y. In this case, a statistical test of independence with HSIC measure as statistic can be built. H0 is "X and Y are independent", against H1: X and Y are dependent. P-value can be computed either with asymptotic approximation (Gamma approximation), either with permutation method. See Meynaoui et al. (2019) for details.

Usage

sensiHSIC(model = NULL, X, kernelX = "rbf", paramX = NA, 
            kernelY = "rbf", paramY = NA, nboot = 0, conf = 0.95, 
            estimator.type = "V-stat", test.method = "Asymptotic", 
            B = 1000, ...)
  # S3 method for sensiHSIC
tell(x, y = NULL, …)
  # S3 method for sensiHSIC
print(x, …)
  # S3 method for sensiHSIC
plot(x, ylim = c(0, 1), …)
  # S3 method for sensiHSIC
ggplot(x, ylim = c(0, 1), …)

Arguments

model

a function, or a model with a predict method, defining the model to analyze.

X

a matrix or data.frame representing the input random sample.

kernelX

a string or a list of strings specifying the reproducing kernel to be used for the input variables. If only one kernel is provided, it is used for all input variables. Available choices are "rbf" (Gaussian), "laplace" (exponential), "dcov" (distance covariance, see details), "raquad" (rationale quadratic), "invmultiquad" (inverse multiquadratic), "linear" (Euclidean scalar product), "matern3" (Matern 3/2), "matern5" (Matern 5/2), "ssanova1" (kernel of Sobolev space of order 1) and "ssanova2" (kernel of Sobolev space of order 2)

paramX

a scalar or a vector of hyperparameters to be used in the input variable kernels. If only one scalar is provided, it is replicated for all input variables. By default paramX is equal to the standard deviation of the input variable for "rbf", "laplace", "raquad", "invmultiquad", "matern3" and "matern5" and to 1 for "dcov". Kernels "linear", "ssanova1" and "ssanova2" do not involve hyperparameters. If kernelX is a combination of kernels with and without hyperparameters, paramX must have a (dummy) value for the hyperparameter-free kernels, see examples below

kernelY

a string specifying the reproducing kernel to be used for the output variable. Available choices are "rbf" (Gaussian), "laplace" (exponential), "dcov" (distance covariance, see details), "raquad" (rationale quadratic), "invmultiquad" (inverse multiquadratic), "linear" (Euclidean scalar product), "matern3" (Matern 3/2), "matern5" (Matern 5/2), "ssanova1" (kernel of Sobolev space of order 1) and "ssanova2" (kernel of Sobolev space of order 2).

paramY

a scalar to be used in the output variable kernel. By default paramY is equal to the standard deviation of the output variable for "rbf", "laplace", "raquad", "invmultiquad", "matern3" and "matern5" and to 1 for "dcov". Kernels "linear", "ssanova1" and "ssanova2" do not involve hyperparameters

nboot

the number of bootstrap replicates

conf

the confidence level for confidence intervals

estimator.type

a string specifying the type of HSIC estimator. Two types of estimators are available, V-statistic of U-statistic. If estimator.type = "V-stat" (default value), the HSIC is estimated with a biased (but asymptotically unbiased) estimator, more practical for numerical implementation. If estimator.type = "U-stat", the unbiased estimator is used. The variance is of order o(1/n) for both estimators (n being the sample size, i.e. number of rows of X). More details in Meynaoui et al., 2019

test.method

a string specifying the method used to compute the p-value of the HSIC-based independence test. If test.method = "Asymptotic" (default value), an asympotic approximation with Gamma distribution is used. If test.method = "Permutation", a permutation method based on B boostrap samples is used to estimate the p-value in a non-asymptotic framework. Permutation-test are recommened for low sample size n. More details in Meynaoui et al., 2019

B

number of boostrap samples used in the permutation method for the estimation of P-values in independence test. B is used only if test.method is "Permutation"

x

a list of class "sensiHSIC" storing the state of the sensitivity study (parameters, data, estimates).

y

a vector of model responses.

ylim

y-coordinate plotting limits.

any other arguments for model which are passed unchanged each time it is called.

Value

sensiHSIC returns a list of class "sensiHSIC", containing all the input arguments detailed before, plus the following components:

call

the matched call.

X

a data.frame containing the design of experiments

y

a vector of model responses

S

the estimations of normalized HSIC sensitivity indices (also denoted R2HSIC)

HSICXY

the estimation of HSIC sensitivity indices (numerator in S formula)

Pvalue

the estimation of P-value of the independence test based on HSIC statistic

Details

The HSIC sensitivity indices are obtained as a normalized version of the Hilbert-Schmidt independence criterion: $$S_i^{HSIC} = R^2_{HSIC} = \frac{HSIC(X_i,Y)}{\sqrt{HSIC(X_i,X_i)}\sqrt{HSIC(Y,Y)}},$$ see Da Veiga (2014) for details. When kernelX="dcov" and kernelY="dcov", the kernel is given by \(k(u,u')=1/2(||u||+||u'||-||u-u'||)\) and the sensitivity index is equal to the distance correlation introduced by Szekely et al. (2007) as was recently proven by Sejdinovic et al. (2013).

References

Da Veiga S. (2015), Global sensitivity analysis with dependence measures, Journal of Statistical Computation and Simulation, 85(7), 1283--1305.

Gretton A., Bousquet O., Smola A., Scholkopf B. (2005), Measuring statistical dependence with hilbert-schmidt norms, Jain S, Simon H, Tomita E, editors: Algorithmic learning theory, Lecture Notes in Computer Science, Vol. 3734, Berlin: Springer, 63--77.

Meynaoui, A., Marrel, A., and Laurent, B. (2019). New statistical methodology for secondlevel global sensitivity analysis, Preprint, ArXiv 1902.07030.

Sejdinovic D., Sriperumbudur B., Gretton A., Fukumizu K., (2013), Equivalence of distance-based and RKHS-based statistics in hypothesis testing, Annals of Statistics 41(5), 2263--2291.

Szekely G.J., Rizzo M.L., Bakirov N.K. (2007), Measuring and testing dependence by correlation of distances, Annals of Statistics 35(6), 2769--2794.

See Also

kde, sensiFdiv

Examples

Run this code
# NOT RUN {
 
# }
# NOT RUN {
  library(ggplot2)
  library(boot)
  
  # Test case : the non-monotonic Sobol g-function
  # Only one kernel is provided with default hyperparameter value
  
  n <- 100
  X <- data.frame(matrix(runif(8 * n), nrow = n))
  x <- sensiHSIC(model = sobol.fun, X, kernelX = "raquad", kernelY = "rbf", 
    nboot = 100, test.method = "Asymptotic")
  print(x)
  ggplot(x)
  
  x <- sensiHSIC(model = sobol.fun, X, kernelX = "rbf", kernelY = "rbf", 
    estimator.type = "U-stat", test.method = "Permutation")
  print(x)
  attributes(x)
  x$estimator.type

  x <- sensiHSIC(model = sobol.fun, X, kernelX = "raquad", kernelY = "rbf", 
    nboot = 100, test.method = "Permutation", B=2000)
  print(x)
  ggplot(x)

  # Test case : the Ishigami function
  
  # A list of kernels is given with default hyperparameter value
  n <- 100
  X <- data.frame(matrix(-pi+2*pi*runif(3 * n), nrow = n))
  x <- sensiHSIC(model = ishigami.fun, X, kernelX = c("rbf","matern3","dcov"), 
                  kernelY = "rbf")
  print(x)
  ggplot(x)

  # A combination of kernels is given and a dummy value is passed for 
  # the first hyperparameter
  x <- sensiHSIC(model = ishigami.fun, X, kernelX = c("ssanova1","matern3","dcov"), 
                  paramX = c(1,2,1), kernelY = "ssanova1")
  print(x)
  ggplot(x)
  
 
# }

Run the code above in your browser using DataCamp Workspace