importance: Calculate importances of baselearners and input variables in a prediction rule ensemble (pre)

Description

importance calculates importances for rules, linear terms and input variables in the prediction rule ensemble (pre), and creates a bar plot of variable importances.

Usage

importance(object, standardize = FALSE, global = TRUE,
  quantprobs = c(0.75, 1), penalty.par.val = "lambda.1se",
  round = NA, plot = TRUE, ylab = "Importance",
  main = "Variable importances", abbreviate = 10L, diag.xlab = TRUE,
  diag.xlab.hor = 0, diag.xlab.vert = 2, cex.axis = 1,
  legend = "topright", ...)

Arguments

object

an object of class pre

standardize

logical. Should baselearner importances be standardized with respect to the outcome variable? If TRUE, baselearner importances have a minimum of 0 and a maximum of 1. Only used for ensembles with numeric (non-count) response variables.

global

logical. Should global importances be calculated? If FALSE, local importances will be calculated, given the quantiles of the predictions F(x) in quantprobs.

quantprobs

optional numeric vector of length two. Only used when global = FALSE. Probabilities for calculating sample quantiles of the range of F(X), over which local importances are calculated. The default provides variable importances calculated over the 25% highest values of F(X).

penalty.par.val

character or numeric. Value of the penalty parameter $\lambda$ to be employed for selecting the final ensemble. The default "lambda.min" employs the $\lambda$ value within 1 standard error of the minimum cross-validated error. Alternatively, "lambda.min" may be specified, to employ the $\lambda$ value with minimum cross-validated error, or a numeric value $>0$ may be specified, with higher values yielding a sparser ensemble. To evaluate the trade-off between accuracy and sparsity of the final ensemble, inspect pre_object$glmnet.fit and plot(pre_object$glmnet.fit).

round

integer. Number of decimal places to round numeric results to. If NA (default), no rounding is performed.

plot

logical. Should variable importances be plotted?

ylab

character string. Plotting label for y-axis. Only used when plot = TRUE.

main

character string. Main title of the plot. Only used when plot = TRUE.

abbreviate

integer or logical. Number of characters to abbreviate x axis names to. If FALSE, no abbreviation is performed.

diag.xlab

logical. Should variable names be printed diagonally (that is, in a 45 degree angle)? Alternatively, variable names may be printed vertically by specifying diag.xlab = FALSE and las = 2.

diag.xlab.hor

numeric. Horizontal adjustment for lining up variable names with bars in the plot if variable names are printed diagonally.

diag.xlab.vert

positive integer. Vertical adjustment for position of variable names, if printed diagonally. Corresponds to the number of character spaces added after variable names.

cex.axis

numeric. The magnification to be used for axis annotation relative to the current setting of cex.

legend

logical or character. Should legend be plotted for multinomial or multivariate responses and if so, where? Defaults to "topright", which puts the legend in the top-right corner of the plot. Alternatively, "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right", "center" and FALSE (which omits the legend) can be specified.

...

further arguments to be passed to barplot (only used when plot = TRUE).

Value

A list with two dataframes: $baseimps, giving the importances for baselearners in the ensemble, and $varimps, giving the importances for all predictor variables.

Details

See also sections 6 and 7 of Friedman & Popecus (2008).

References

Fokkema, M. (2018). Fitting prediction rule ensembles with R package pre. https://arxiv.org/abs/1707.07149.

Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916-954.

Examples

Run this code

# NOT RUN {
set.seed(42)
airq.ens <- pre(Ozone ~ ., data = airquality[complete.cases(airquality),])
# calculate global importances:
importance(airq.ens)
# calculate local importances (default: over 25% highest predicted values):
importance(airq.ens, global = FALSE)
# calculate local importances (custom: over 25% lowest predicted values):
importance(airq.ens, global = FALSE, quantprobs = c(0, .25))
# }

Run the code above in your browser using DataLab