importance
calculates importances for rules, linear terms and input
variables in the prediction rule ensemble (pre), and creates a bar plot
of variable importances.
importance(object, standardize = FALSE, global = TRUE,
quantprobs = c(0.75, 1), penalty.par.val = "lambda.1se",
round = NA, plot = TRUE, ylab = "Importance",
main = "Variable importances", abbreviate = 10L, diag.xlab = TRUE,
diag.xlab.hor = 0, diag.xlab.vert = 2, cex.axis = 1,
legend = "topright", ...)
an object of class pre
logical. Should baselearner importances be standardized
with respect to the outcome variable? If TRUE
, baselearner importances
have a minimum of 0 and a maximum of 1. Only used for ensembles with
numeric (non-count) response variables.
logical. Should global importances be calculated? If
FALSE
, local importances will be calculated, given the quantiles
of the predictions F(x) in quantprobs
.
optional numeric vector of length two. Only used when
global = FALSE
. Probabilities for calculating sample quantiles of the
range of F(X), over which local importances are calculated. The default
provides variable importances calculated over the 25% highest values of F(X).
character or numeric. Value of the penalty parameter
\(\lambda\) to be employed for selecting the final ensemble. The default
"lambda.min"
employs the \(\lambda\) value within 1 standard
error of the minimum cross-validated error. Alternatively,
"lambda.min"
may be specified, to employ the \(\lambda\) value
with minimum cross-validated error, or a numeric value \(>0\) may be
specified, with higher values yielding a sparser ensemble. To evaluate the
trade-off between accuracy and sparsity of the final ensemble, inspect
pre_object$glmnet.fit
and plot(pre_object$glmnet.fit)
.
integer. Number of decimal places to round numeric results to.
If NA
(default), no rounding is performed.
logical. Should variable importances be plotted?
character string. Plotting label for y-axis. Only used when
plot = TRUE
.
character string. Main title of the plot. Only used when
plot = TRUE
.
integer or logical. Number of characters to abbreviate
x axis names to. If FALSE
, no abbreviation is performed.
logical. Should variable names be printed diagonally (that
is, in a 45 degree angle)? Alternatively, variable names may be printed
vertically by specifying diag.xlab = FALSE
and las = 2
.
numeric. Horizontal adjustment for lining up variable names with bars in the plot if variable names are printed diagonally.
positive integer. Vertical adjustment for position of variable names, if printed diagonally. Corresponds to the number of character spaces added after variable names.
numeric. The magnification to be used for axis annotation
relative to the current setting of cex
.
logical or character. Should legend be plotted for multinomial
or multivariate responses and if so, where? Defaults to "topright"
,
which puts the legend in the top-right corner of the plot. Alternatively,
"bottomright"
, "bottom"
, "bottomleft"
, "left"
,
"topleft"
, "top"
, "topright"
, "right"
,
"center"
and FALSE
(which omits the legend) can be specified.
further arguments to be passed to barplot
(only used
when plot = TRUE
).
A list with two dataframes: $baseimps
, giving the importances
for baselearners in the ensemble, and $varimps
, giving the importances
for all predictor variables.
See also sections 6 and 7 of Friedman & Popecus (2008).
Fokkema, M. (2018). Fitting prediction rule ensembles with R package pre. https://arxiv.org/abs/1707.07149.
Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916-954.
# NOT RUN {
set.seed(42)
airq.ens <- pre(Ozone ~ ., data = airquality[complete.cases(airquality),])
# calculate global importances:
importance(airq.ens)
# calculate local importances (default: over 25% highest predicted values):
importance(airq.ens, global = FALSE)
# calculate local importances (custom: over 25% lowest predicted values):
importance(airq.ens, global = FALSE, quantprobs = c(0, .25))
# }
Run the code above in your browser using DataCamp Workspace