average_vim: Average multiple independent importance estimates

Description

Average the output from multiple calls to vimp_regression, for different independent groups, into a single estimate with a corresponding standard error and confidence interval.

Usage

average_vim(..., weights = rep(1/length(list(...)), length(list(...))))

Value

an object of class vim containing the (weighted) average of the individual importance estimates, as well as the appropriate standard error and confidence interval. This results in a list containing:

s: - a list of the column(s) to calculate variable importance for
SL.library: - a list of the libraries of learners passed to SuperLearner
full_fit: - a list of the fitted values of the chosen method fit to the full data
red_fit: - a list of the fitted values of the chosen method fit to the reduced data
est: - a vector with the corrected estimates
naive: - a vector with the naive estimates
update: - a list with the influence curve-based updates
mat: - a matrix with the estimated variable importance, the standard error, and the \((1-\alpha) \times 100\)% confidence interval
full_mod: - a list of the objects returned by the estimation procedure for the full data regression (if applicable)
red_mod: - a list of the objects returned by the estimation procedure for the reduced data regression (if applicable)
alpha: - the level, for confidence interval calculation
y: - a list of the outcomes

Arguments

...: an arbitrary number of vim objects.
weights: how to average the vims together, and must sum to 1; defaults to 1/(number of vims) for each vim, corresponding to the arithmetic mean

Examples

Run this code

# generate the data
p <- 2
n <- 100
x <- data.frame(replicate(p, stats::runif(n, -5, 5)))

# apply the function to the x's
smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2

# generate Y ~ Normal (smooth, 1)
y <- smooth + stats::rnorm(n, 0, 1)

# set up a library for SuperLearner; note simple library for speed
library("SuperLearner")
learners <- c("SL.glm", "SL.mean")

# get estimates on independent splits of the data
samp <- sample(1:n, n/2, replace = FALSE)

# using Super Learner (with a small number of folds, for illustration only)
est_2 <- vimp_regression(Y = y[samp], X = x[samp, ], indx = 2, V = 2,
           run_regression = TRUE, alpha = 0.05,
           SL.library = learners, cvControl = list(V = 2))

est_1 <- vimp_regression(Y = y[-samp], X = x[-samp, ], indx = 2, V = 2,
           run_regression = TRUE, alpha = 0.05,
           SL.library = learners, cvControl = list(V = 2))

ests <- average_vim(est_1, est_2, weights = c(1/2, 1/2))

Run the code above in your browser using DataLab