explain: Explain predictions from final prediction rule ensemble

Description

explain shows which rules apply to which observations and visualizes the contribution of rules and linear predictors to the predicted values

Usage

explain(object, newdata, penalty.par.val = "lambda.1se", response = 1L,
  plot = TRUE, intercept = FALSE, center.linear = FALSE,
  plot.max.nobs = 4, plot.dim = c(2, 2), plot.obs.names = TRUE,
  pred.type = "response", digits = 3L, cex = 0.8,
  ylab = "Contribution to linear predictor", bar.col = c("#E495A5",
  "#39BEB1"), rule.col = "darkgrey")

Arguments

object

object of class pre.

newdata

optional dataframe of new (test) observations, including all predictor variables used for deriving the prediction rule ensemble.

penalty.par.val

character or numeric. Value of the penalty parameter $\lambda$ to be employed for selecting the final ensemble. The default "lambda.min" employs the $\lambda$ value within 1 standard error of the minimum cross-validated error. Alternatively, "lambda.min" may be specified, to employ the $\lambda$ value with minimum cross-validated error, or a numeric value $>0$ may be specified, with higher values yielding a sparser ensemble. To evaluate the trade-off between accuracy and sparsity of the final ensemble, inspect pre_object$glmnet.fit and plot(pre_object$glmnet.fit).

response

numeric or character vector of length one. Specifies the name or number of the response variable (for multivariate responses) or the name or number of the factor level (for multinomial responses) for which explanations and contributions should be computed and/or plotted. Only used forpres fitted to multivariate or multinomial responses.

plot

logical. Should explanations be plotted?

intercept

logical. Specifies whether intercept should be included in explaining predictions.

center.linear

logical. Specifies whether linear terms should be centered with respect to the training sample mean before computing their contribution to the predicted value. If intercept = TRUE, this will also affect the intercept. That is, the value of the intercept returned will differ from that of the value returned by the print method.

plot.max.nobs

numeric. Specifies maximum number of observations for which explanations will be plotted. The default (4) plots the explanation for the first four observations supplied in newdata.

plot.dim

numeric vector of length 2. Specifies the number of rows and columns in the resulting plot.

plot.obs.names

logical vector of length 1, NULL, or character vector of length nrow(data) supplying the names that should be used for individual observations' plots. If TRUE (default), rownames(newdata) will be used as titles. If NULL, paste("Observation", 1:nrow(newdata)) will be used as titles. If FALSE, no titles will be plotted.

pred.type

character. Specifies the type of predicted values to be computed, returned and provided in the plot(s). Note that the computed contributions must be additive and are therefore always on the scale of the linear predictor.

digits

integer. Specifies the number of digits used in depcting the predicted values in the plot.

cex

numeric. Specifies the relative text size of title, tick and axis labels.

ylab

character. Specifies the label for the horizonantal (y-) axis.

bar.col

character vector of length two. Specifies the colors to be used for plotting the positive and negative contributions to the predictions, respectively.

rule.col

character. Specifies the color to be used for plotting the rule descriptions. If NULL, rule descriptions are not plotted.

Details

Provides a graphical depiction of the contribution of rules and linear terms to the individual predictions (if plot = TRUE. Invisibly returns a list with objects predictors and contribution. predictors contains the values of the rules and linear terms for each observation in newdata, for those rules and linear terms included in the final ensemble with the specified value of penalty.par.val. contribution contains the values of predictors, multiplied by the estimated values of the coefficients in the final ensemble selected with the specified value of penalty.par.val. All contributions are calculated w.r.t. the intercept, by default. Thus, if a given rule applies to an observation in newdata, the contribution of that rule equals the estimated coefficient of that rule. If a given rule does not apply to an observation in newdata, the contribution of that rule equals 0. For linear terms, contributions can be centered, or not (the default). Thus, by default the contribution of a linear terms for an observation in newdata equals the obeservation's value of the linear term, times the estimated coefficient of the linear term. If center.linear = TRUE, the contribution of a linear term for an observation in newdata equals (the value of the linear temr, minus the mean value of the linear term in the training data) times the estimated coefficient for the linear term.

Examples

Run this code

# NOT RUN {
airq <- airquality[complete.cases(airquality), ]
set.seed(1)
train <- sample(1:nrow(airq), size = 100)
set.seed(42)
airq.ens <- pre(Ozone ~ ., data = airq[train,])
airq.ens.exp <- explain(airq.ens, newdata = airq[-train,])
airq.ens.exp$predictors
airq.ens.exp$contribution

## Can also include intercept in explanation:
airq.ens.exp <- explain(airq.ens, newdata = airq[-train,])

## Fit PRE with linear terms only to illustrate effect of center.linear:
set.seed(42)
airq.ens2 <- pre(Ozone ~ ., data = airq[train,], type = "linear")
## When not centered around their means, Month has negative and 
##   Day has positive contribution:
explain(airq.ens2, newdata = airq[-train,][1:2,],
        penalty.par.val = "lambda.min")$contribution
## After mean centering, contributions of Month and Day have switched
##   sign (for these two observations): 
explain(airq.ens2, newdata = airq[-train,][1:2,], 
        penalty.par.val = "lambda.min", center.linear = TRUE)$contribution
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Details

See Also

Examples