shapleySubsetMc: Estimation of Shapley effects from data using nearest neighbors method

Description

shapleySubsetMc implements the estimation of the Shapley effects from data using some nearest neighbors method to generate according to the conditional distributions of the inputs. It can be used with categorical inputs.

Usage

shapleySubsetMc(X,Y, Ntot=NULL, Ni=3, cat=NULL, weight=NULL, discrete=NULL)
# S3 method for shapleySubsetMc
plot(x, ylim = c(0, 1), …)

Arguments

a matrix or a dataframe of the input sample.

a vector of the output sample.

Ntot

an integer of the approximate cost wanted.

the number of nearest neighbours taken for each point.

cat

a vector giving the indices of the input categorical variables.

weight

a vector with the same length of cat giving the weight of each categorical variable in the product distance.

discrete

a vector giving the indices of the input variable that are real, and not categorical, but that can take several times the same values.

a list of class "shapleySubsetMc" storing the state of the sensitivity study (Shapley effects, cost, names of inputs).

ylim

y-coordinate plotting limits.

…

any other arguments for plotting.

Value

shapleySubsetMc returns a list of class "shapleySubsetMc", containing:

shapley

the Shapley effects estimates.

cost

the real total cost of these estimates: the total number of points for which the nearest neighbours were computed.

names

the labels of the input variables.

Details

If weight = NULL, all the categorical variables will have the same weight 1.

If Ntot = NULL, the nearest neighbours will be compute for all the \(n (2^p-2)\) points, where n is the length of the sample. The estimation can be very long with this parameter.

References

B. Broto, F. Bachoc, M. Depecker, 2018, Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution, Preprint, HAL: hal-01962010.

Examples

Run this code

# NOT RUN {
# First example: the linear Gaussian framework

# we generate a covariance matrice Sigma
p=4 #dimension
A=matrix(rnorm(p^2),nrow=p,ncol=p)
Sigma=t(A)%*%A # it means t(A)%*%A
C=chol(Sigma)
n=2000 #sample size

Z=matrix(rnorm(p*n),nrow=n,ncol=p)
X=Z<!-- %*%C # X is a gaussian vector with zero mean and covariance Sigma -->
Y=rowSums(X) 
Shap=shapleySubsetMc(X=X,Y=Y,Ntot=5000)
plot(Shap)


#Second example: The Sobol model with heterogeneous inputs

p=8 #dimension
A=matrix(rnorm(p^2),nrow=p,ncol=p)
Sigma=t(A)%*%A
C=chol(Sigma)
n=5000 #sample size

Z=matrix(rnorm(p*n),nrow=n,ncol=p)
X=Z<!-- %*%C+1 # X is a gaussian vector with mean (1,1,..,1) and covariance Sigma -->

#we create discrete and categorical variables
X[,1]=round(X[,1]/2) 
X[,2]=X[,2]>2
X[,4]=-2*round(X[,4])+4
X[(X[,6]>0 &X[,6]<1),6]=1

cat=c(1,2)  # we choose to take X1 and X2 as categorical variables (with the discrete distance)
discrete=c(4,6) # we indicate that X4 and X6 can take several times the same value

Y=sobol.fun(X)

Shap=shapleySubsetMc(X=X,Y=Y, cat=cat, discrete=discrete,Ntot=20000, Ni=10)

plot(Shap)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab