Learn R Programming

vimp (version 1.1.6)

vimp_regression: Nonparametric Variable Importance Estimates

Description

Compute estimates of and confidence intervals for nonparametric ANOVA-based variable importance.

Usage

vimp_regression(Y, X, f1 = NULL, f2 = NULL, indx = 1,
  run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost",
  "SL.mean"), alpha = 0.05, na.rm = FALSE, ...)

Arguments

Y

the outcome.

X

the covariates.

f1

the fitted values from a flexible estimation technique regressing Y on X.

f2

the fitted values from a flexible estimation technique regressing the fitted values in f1 on X withholding the columns in indx.

indx

the indices of the covariate(s) to calculate variable importance for; defaults to 1.

run_regression

if outcome Y and covariates X are passed to vimp_regression, and run_regression is TRUE, then Super Learner will be used; otherwise, variable importance will be computed using the inputted fitted values.

SL.library

a character vector of learners to pass to SuperLearner, if f1 and f2 are Y and X, respectively. Defaults to SL.glmnet, SL.xgboost, and SL.mean.

alpha

the level to compute the confidence interval at. Defaults to 0.05, corresponding to a 95% confidence interval.

na.rm

should we remove NA's in the outcome and fitted values in computation? (defaults to FALSE)

...

other arguments to the estimation tool, see "See also".

Value

An object of classes vim and vim_regression. See Details for more information.

Details

See the paper by Williamson, Gilbert, Simon, and Carone for more details on the mathematics behind this function, and the validity of the confidence intervals. In the interest of transparency, we return most of the calculations within the vim object. This results in a list containing:

  • call - the call to vim

  • s - the column(s) to calculate variable importance for

  • SL.library - the library of learners passed to SuperLearner

  • full_fit - the fitted values of the chosen method fit to the full data

  • red_fit - the fitted values of the chosen method fit to the reduced data

  • est - the estimated variable importance

  • naive - the naive estimator of variable importance

  • update - the influence curve-based update

  • se - the standard error for the estimated variable importance

  • ci - the \((1-\alpha) \times 100\)% confidence interval for the variable importance estimate

  • full_mod - the object returned by the estimation procedure for the full data regression (if applicable)

  • red_mod - the object returned by the estimation procedure for the reduced data regression (if applicable)

  • alpha - the level, for confidence interval calculation

See Also

SuperLearner for specific usage of the SuperLearner function and package.

Examples

Run this code
# NOT RUN {
library(SuperLearner)
library(gam)
## generate the data
## generate X
p <- 2
n <- 100
x <- data.frame(replicate(p, stats::runif(n, -5, 5)))

## apply the function to the x's
smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2

## generate Y ~ Normal (smooth, 1)
y <- smooth + stats::rnorm(n, 0, 1)

## set up a library for SuperLearner
learners <- "SL.gam"

## using Y and X
est <- vimp_regression(y, x, indx = 2, 
           alpha = 0.05, run_regression = TRUE, 
           SL.library = learners, cvControl = list(V = 10))

## using pre-computed fitted values
full <- SuperLearner(Y = y, X = x,
SL.library = learners, cvControl = list(V = 10))
full.fit <- predict(full)$pred
reduced <- SuperLearner(Y = full.fit, X = x[, 2, drop = FALSE],
SL.library = learners, cvControl = list(V = 10))
red.fit <- predict(reduced)$pred

est <- vimp_regression(Y = y, f1 = full.fit, f2 = red.fit, 
            indx = 2, run_regression = FALSE, alpha = 0.05)

# }

Run the code above in your browser using DataLab