pairplot
creates a partial dependence plot to assess the effects of a
pair of predictor variables on the predictions of the ensemble. Note that plotting
partial dependence is computationally intensive. Computation time will increase
fast with increasing numbers of observations and variables. For large
datasets, package `plotmo` (Milborrow, 2019) provides more efficient functions
for plotting partial dependence and also supports `pre` models.
pairplot(object, varnames, type = "both",
penalty.par.val = "lambda.1se", nvals = c(20L, 20L),
pred.type = "response", ...)
an object of class pre
character vector of length two. Currently, pairplots can only
be requested for non-nominal variables. If varnames specifies the name(s) of
variables of class "factor"
, an error will be printed.
character string. Type of plot to be generated.
type = "heatmap"
yields a heatmap plot, type = "contour"
yields
a contour plot, type = "both"
yields a heatmap plot with added contours,
type = "perspective"
yields a three dimensional plot.
character or numeric. Value of the penalty parameter
\(\lambda\) to be employed for selecting the final ensemble. The default
"lambda.min"
employs the \(\lambda\) value within 1 standard
error of the minimum cross-validated error. Alternatively,
"lambda.min"
may be specified, to employ the \(\lambda\) value
with minimum cross-validated error, or a numeric value \(>0\) may be
specified, with higher values yielding a sparser ensemble. To evaluate the
trade-off between accuracy and sparsity of the final ensemble, inspect
pre_object$glmnet.fit
and plot(pre_object$glmnet.fit)
.
optional numeric vector of length 2. For how many values of
x1 and x2 should partial dependence be plotted? If NULL
, all observed
values for the two predictor variables specified will be used (see details).
character string. Type of prediction to be plotted on z-axis.
pred.type = "response"
gives fitted values for continuous outputs and
fitted probabilities for nominal outputs. pred.type = "link"
gives fitted
values for continuous outputs and linear predictor values for nominal outputs.
By default, partial dependence will be plotted for each combination
of 20 values of the specified predictor variables. When nvals = NULL
is
specified a dependence plot will be created for every combination of the unique
observed values of the two predictor variables specified. Therefore, using
nvals = NULL
will often result in long computation times, and / or
memory allocation errors. Also, pre
ensembles derived
from training datasets that are very wide or long may result in long
computation times and / or memory allocation errors. In such cases, reducing
the values supplied to nvals
will reduce computation time and / or
memory allocation errors. When the nvals argument is supplied, values for the
minimum, maximum, and nvals - 2 intermediate values of the predictor variable
will be plotted. Furthermore, if none of the variables specified appears in
the final prediction rule ensemble, an error will occur.
See also section 8.1 of Friedman & Popescu (2008).
Fokkema, M. (2018). Fitting prediction rule ensembles with R package pre. https://arxiv.org/abs/1707.07149.
Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916-954.
Milborrow, S. (2019). plotmo: Plot a model's residuals, response, and partial dependence plots. https://CRAN.R-project.org/package=plotmo
# NOT RUN {
set.seed(42)
airq.ens <- pre(Ozone ~ ., data = airquality[complete.cases(airquality),])
pairplot(airq.ens, c("Temp", "Wind"))
# }
Run the code above in your browser using DataCamp Workspace