explain: Explain predictions from final prediction rule ensemble

Description

explain shows which rules apply to which observations and visualizes the contribution of rules and linear predictors to the predicted values

Usage

explain(
  object,
  newdata,
  penalty.par.val = "lambda.1se",
  response = 1L,
  plot = TRUE,
  intercept = FALSE,
  center.linear = FALSE,
  plot.max.nobs = 4,
  plot.dim = c(2, 2),
  plot.obs.names = TRUE,
  pred.type = "response",
  digits = 3L,
  cex = 0.8,
  ylab = "Contribution to linear predictor",
  bar.col = c("#E495A5", "#39BEB1"),
  rule.col = "darkgrey",
  ...
)

Arguments

object: object of class pre.
newdata: optional dataframe of new (test) observations, including all predictor variables used for deriving the prediction rule ensemble.
penalty.par.val: character or numeric. Value of the penalty parameter $\lambda$ to be employed for selecting the final ensemble. The default "lambda.min" employs the $\lambda$ value within 1 standard error of the minimum cross-validated error. Alternatively, "lambda.min" may be specified, to employ the $\lambda$ value with minimum cross-validated error, or a numeric value $>0$ may be specified, with higher values yielding a sparser ensemble. To evaluate the trade-off between accuracy and sparsity of the final ensemble, inspect pre_object$glmnet.fit and plot(pre_object$glmnet.fit).
response: numeric or character vector of length one. Specifies the name or number of the response variable (for multivariate responses) or the name or number of the factor level (for multinomial responses) for which explanations and contributions should be computed and/or plotted. Only used forpres fitted to multivariate or multinomial responses.
plot: logical. Should explanations be plotted?
intercept: logical. Specifies whether intercept should be included in explaining predictions.
center.linear: logical. Specifies whether linear terms should be centered with respect to the training sample mean before computing their contribution to the predicted value. If intercept = TRUE, this will also affect the intercept. That is, the value of the intercept returned will differ from that of the value returned by the print method.
plot.max.nobs: numeric. Specifies maximum number of observations for which explanations will be plotted. The default (4) plots the explanation for the first four observations supplied in newdata.
plot.dim: numeric vector of length 2. Specifies the number of rows and columns in the resulting plot.
plot.obs.names: logical vector of length 1, NULL, or character vector of length nrow(data) supplying the names that should be used for individual observations' plots. If TRUE (default), rownames(newdata) will be used as titles. If NULL, paste("Observation", 1:nrow(newdata)) will be used as titles. If FALSE, no titles will be plotted.
pred.type: character. Specifies the type of predicted values to be computed, returned and provided in the plot(s). Note that the computed contributions must be additive and are therefore always on the scale of the linear predictor.
digits: integer. Specifies the number of digits used in depcting the predicted values in the plot.
cex: numeric. Specifies the relative text size of title, tick and axis labels.
ylab: character. Specifies the label for the horizonantal (y-) axis.
bar.col: character vector of length two. Specifies the colors to be used for plotting the positive and negative contributions to the predictions, respectively.
rule.col: character. Specifies the color to be used for plotting the rule descriptions. If NULL, rule descriptions are not plotted.
...: Further arguments to be passed to predict.pre and predict.cv.glmnet.

Details

Provides a graphical depiction of the contribution of rules and linear terms to the individual predictions (if plot = TRUE. Invisibly returns a list with objects predictors and contribution. predictors contains the values of the rules and linear terms for each observation in newdata, for those rules and linear terms included in the final ensemble with the specified value of penalty.par.val. contribution contains the values of predictors, multiplied by the estimated values of the coefficients in the final ensemble selected with the specified value of penalty.par.val. All contributions are calculated w.r.t. the intercept, by default. Thus, if a given rule applies to an observation in newdata, the contribution of that rule equals the estimated coefficient of that rule. If a given rule does not apply to an observation in newdata, the contribution of that rule equals 0. For linear terms, contributions can be centered, or not (the default). Thus, by default the contribution of a linear terms for an observation in newdata equals the obeservation's value of the linear term, times the estimated coefficient of the linear term. If center.linear = TRUE, the contribution of a linear term for an observation in newdata equals (the value of the linear temr, minus the mean value of the linear term in the training data) times the estimated coefficient for the linear term.

References

Fokkema, M. & Strobl, C. (2020). Fitting prediction rule ensembles to psychological research data: An introduction and tutorial. Psychological Methods 25(5), 636-652. tools:::Rd_expr_doi("10.1037/met0000256"), https://arxiv.org/abs/1907.05302

Examples

Run this code

airq <- airquality[complete.cases(airquality), ]
set.seed(1)
train <- sample(1:nrow(airq), size = 100)
set.seed(42)
airq.ens <- pre(Ozone ~ ., data = airq[train,])
airq.ens.exp <- explain(airq.ens, newdata = airq[-train,])
airq.ens.exp$predictors
airq.ens.exp$contribution

## Can also include intercept in explanation:
airq.ens.exp <- explain(airq.ens, newdata = airq[-train,])

## Fit PRE with linear terms only to illustrate effect of center.linear:
set.seed(42)
airq.ens2 <- pre(Ozone ~ ., data = airq[train,], type = "linear")
## When not centered around their means, Month has negative and 
##   Day has positive contribution:
explain(airq.ens2, newdata = airq[-train,][1:2,],
        penalty.par.val = "lambda.min")$contribution
## After mean centering, contributions of Month and Day have switched
##   sign (for these two observations): 
explain(airq.ens2, newdata = airq[-train,][1:2,], 
        penalty.par.val = "lambda.min", center.linear = TRUE)$contribution