Learn R Programming

sensitivity (version 1.31.0)

VIM: Variance-based Importance Measures in linear model

Description

VIM summarizes some linear variance-based importance measures useful in data analysis/machine learning context (dependent inputs' case): VIF (i.e. variance inflation factor which is a multicollinearity metric), squared SRC, squared PCC, LMG and PMVD, as well as the R2 and Q2 of the linear regression model

Usage

VIM(X, y, logistic = FALSE, nboot = 0, 
    conf = 0.95, max.iter = 1000, parl = NULL)
# S3 method for VIM
print(x, ...)
# S3 method for VIM
plot(x, ylim = c(0,1), ...)
# S3 method for VIM
ggplot(data, mapping = aes(), ..., ylim = c(0,1), 
  environment = parent.frame())

Value

VIM returns a list of class "VIM", containing the following components:

call

the matched call.

R2

a data frame containing the estimations of the R2.

Q2

a data frame containing the estimations of the Q2.

VIF

a data frame containing the estimations of the VIF.

SRC2

a data frame containing the estimations of the squared SRC.

PCC2

a data frame containing the estimations of the squared PCC.

LMG

a data frame containing the estimations of the LMG.

PMVD

a data frame containing the estimations of the PMVD.

X

the observed covariates.

y

the observed outcomes.

logistic

logical. TRUE if the analysis has been made by logistic regression.

nboot

number of bootstrap replicates.

max.iter

if logistic=TRUE, the maximum number of iterative optimization steps allowed for the logistic regression. Default is 1000.

parl

number of chosen cores for the computation.

conf

level for the confidence intervals by bootstrap.

Arguments

X

a matrix or data frame containing the observed covariates (i.e., features, input variables...).

y

a numeric vector containing the observed outcomes (i.e., dependent variable). If logistic=TRUE, can be a numeric vector of zeros and ones, or a logical vector, or a factor.

logistic

logical. If TRUE, the analysis is done via a logistic regression(binomial GLM).

nboot

the number of bootstrap replicates for the computation of confidence intervals.

conf

the confidence level of the bootstrap confidence intervals.

max.iter

if logistic=TRUE, the maximum number of iterative optimization steps allowed for the logistic regression. Default is 1000.

parl

number of cores on which to parallelize the computation. If NULL, then no parallelization is done.

x

the object returned by VIM.

data

the object returned by VIM.

ylim

the y-coordinate limits of the plot.

mapping

Default list of aesthetic mappings to use for plot. If not specified, must be supplied in each layer added to the plot.

environment

[Deprecated] Used prior to tidy evaluation.

...

arguments to be passed to methods, such as graphical parameters (see par).

Author

Bertrand Iooss

Details

This function cannot be used with categorical inputs.

For logistic regression (logistic=TRUE), the \(R^2\) value is equal to: $$R^2 = 1-\frac{\textrm{model deviance}}{\textrm{null deviance}}$$

If too many cores for the machine are passed on to the parl argument, the chosen number of cores is defaulted to the available cores minus one.

References

L. Clouvel, B. Iooss, V. Chabridon, M. Il Idrissi and F. Robin, 2024, An overview of variance-based importance measures in the linear regression context: comparative analyses and numerical tests, Socio-Environmental Systems Modelling, vol. 7, 18681, 2025, doi:10.18174/sesmo.1868. https://hal.science/hal-04102053

See Also

src, pcc, src, lmg, pmvd

Examples

Run this code
# \donttest{

library(parallel)
library(boot)
library(car)

library(mvtnorm)

set.seed(1234)
n <- 100
sigma<-matrix(c(1,0,0,0.9, 0,1,-0.8,0, 0,-0.8,1,0, 0.9,0,0,1), nr=4, nc=4)

############################
# Gaussian correlated inputs

X <- as.data.frame(rmvnorm(n, rep(0,4), sigma))
colnames(X) <- c("X1","X2","X3","X4")

#############################
# Linear Model with small noise, two correlated inputs (X2 and X3) and 
# one dummy input (X4) correlated with another (X1)
epsilon <- rnorm(n,0,0.1)
y <- with(X, X1 - X2 + 0.5 * X3 + epsilon)

# Without Bootstrap confidence intervals
x <- VIM(X, y)
print(x)
plot(x)
library(ggplot2) ; ggplot(x)

# With Boostrap confidence intervals
x <- VIM(X, y, nboot=100, conf=0.9)
print(x)
plot(x)
library(ggplot2) ; ggplot(x)

############################
# Logistic Regression (same regression model)

epsilon <- rnorm(n,0,0.1)
y <- with(X, X1 - X2 + 0.5 * X3 + epsilon > 0)

x <- VIM(X, y, logistic = TRUE)
print(x)
plot(x)
library(ggplot2) ; ggplot(x)
# }

Run the code above in your browser using DataLab