singleplot
creates a partial dependence plot, which shows the effect of
a predictor variable on the ensemble's predictions. Note that plotting partial
dependence is computationally intensive. Computation time will increase fast
with increasing numbers of observations and variables. For large
datasets, package `plotmo` (Milborrow, 2019) provides more efficient functions
for plotting partial dependence and also supports `pre` models.
singleplot(object, varname, penalty.par.val = "lambda.1se",
nvals = NULL, type = "response", ylab = "predicted", ...)
an object of class pre
character vector of length one, specifying the variable for
which the partial dependence plot should be created. Note that varname
should correspond to the variable as described in the model formula used
to generate the ensemble (i.e., including functions applied to the variable).
character or numeric. Value of the penalty parameter
\(\lambda\) to be employed for selecting the final ensemble. The default
"lambda.min"
employs the \(\lambda\) value within 1 standard
error of the minimum cross-validated error. Alternatively,
"lambda.min"
may be specified, to employ the \(\lambda\) value
with minimum cross-validated error, or a numeric value \(>0\) may be
specified, with higher values yielding a sparser ensemble. To evaluate the
trade-off between accuracy and sparsity of the final ensemble, inspect
pre_object$glmnet.fit
and plot(pre_object$glmnet.fit)
.
optional numeric vector of length one. For how many values of x should the partial dependence plot be created?
character string. Type of prediction to be plotted on y-axis.
type = "response"
gives fitted values for continuous outputs and
fitted probabilities for nominal outputs. type = "link"
gives fitted
values for continuous outputs and linear predictor values for nominal outputs.
character. Label to be printed on the y-axis.
Further arguments to be passed to
plot.default
.
By default, a partial dependence plot will be created for each unique
observed value of the specified predictor variable. When the number of unique
observed values is large, this may take a long time to compute. In that case,
specifying the nvals
argument can substantially reduce computing time. When the
nvals
argument is supplied, values for the minimum, maximum, and (nvals - 2)
intermediate values of the predictor variable will be plotted. Note that nvals
can be specified only for numeric and ordered input variables. If the plot is
requested for a nominal input variable, the nvals
argument will be
ignored and a warning printed.
See also section 8.1 of Friedman & Popescu (2008).
Fokkema, M. (2018). Fitting prediction rule ensembles with R package pre. https://arxiv.org/abs/1707.07149.
Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916-954.
Milborrow, S. (2019). plotmo: Plot a model's residuals, response, and partial dependence plots. https://CRAN.R-project.org/package=plotmo
# NOT RUN {
set.seed(42)
airq.ens <- pre(Ozone ~ ., data = airquality[complete.cases(airquality),])
singleplot(airq.ens, "Temp")
# }
Run the code above in your browser using DataLab