pairplot
creates a partial dependence plot to assess the effects of a
pair of predictor variables on the predictions of the ensemble. Note that plotting
partial dependence is computationally intensive. Computation time will increase
fast with increasing numbers of observations and variables. For large
datasets, package `plotmo` (Milborrow, 2019) provides more efficient functions
for plotting partial dependence and also supports `pre` models.
pairplot(
object,
varnames,
type = "both",
gamma = NULL,
penalty.par.val = "lambda.1se",
response = NULL,
nvals = c(20L, 20L),
pred.type = "response",
newdata = NULL,
xlab = NULL,
ylab = NULL,
main = NULL,
...
)
an object of class pre
character vector of length two. Currently, pairplots can only
be requested for non-nominal variables. If varnames specifies the name(s) of
variables of class "factor"
, an error will be printed.
character string. Type of plot to be generated.
type = "heatmap"
yields a heatmap plot, type = "contour"
yields
a contour plot, type = "both"
yields a heatmap plot with added contours,
type = "perspective"
yields a three dimensional plot.
Mixing parameter for relaxed fits. See
coef.cv.glmnet
.
character or numeric. Value of the penalty parameter
\(\lambda\) to be employed for selecting the final ensemble. The default
"lambda.min"
employs the \(\lambda\) value within 1 standard
error of the minimum cross-validated error. Alternatively,
"lambda.min"
may be specified, to employ the \(\lambda\) value
with minimum cross-validated error, or a numeric value \(>0\) may be
specified, with higher values yielding a sparser ensemble. To evaluate the
trade-off between accuracy and sparsity of the final ensemble, inspect
pre_object$glmnet.fit
and plot(pre_object$glmnet.fit)
.
numeric vector of length 1. Only relevant for multivariate gaussian
and multinomial responses. If NULL
(default), PDPs for all response
variables or categories will be produced. A single integer can be specified,
indicating for which response variable or category PDPs should be produced.
optional numeric vector of length 2. For how many values of
x1 and x2 should partial dependence be plotted? If NULL
, a grid of all possible
combinations of the observed values of the two predictor variables specified will be used
(see details).
character string. Type of prediction to be plotted on z-axis.
pred.type = "response"
gives fitted values for continuous outputs and
fitted probabilities for nominal outputs. pred.type = "link"
gives fitted
values for continuous outputs and linear predictor values for nominal outputs.
Optional data.frame
in which to look for variables
with which to predict. If NULL
, the data.frame
used to fit the
original ensemble will be used.
character. Label to be printed on the x-axis. If NULL
,
the first elements of the supplied varnames
will be printed on the x-axis.
character. Label to be printed on the y-axis. If NULL
,
the second element of the supplied varnames
will be printed on the y-axis.
Title for the plot. If NULL
, the name of the response
will be printed.
Further arguments to be passed to image
,
contour
or persp
(depending on
whether type
is specified to be "heatmap"
, "contour"
, "both"
or "perspective"
).
Partial dependence functions are described in section 8.1 of Friedman & Popescu (2008).
By default, partial dependence will be plotted for each combination
of 20 values of the specified predictor variables. When nvals = NULL
is
specified, a dependence plot will be created for every combination of the unique
observed values of the two specified predictor variables. If NA
instead of
a numeric value is specified for one of the predictor variables, all observed
values for that variables will be used. Specifying nvals = NULL
and
nvals = c(NA, NA)
will yield the exact same result.
High values, NA
or NULL
for nvals
result in long
computation times and possibly memory problems. Also, pre
ensembles derived from training datasets that are very wide or long may
result in long computation times and/or memory allocation errors.
In such cases, reducing
the values supplied to nvals
will reduce computation time and/or
memory allocation errors.
When numeric value(s) are specified for nvals
, values for the
minimum, maximum, and nvals - 2 intermediate values of the predictor variable
will be plotted.
Alternatively, newdata
can be specified to provide a different (smaller)
set of observations to compute partial dependence over.
If mi_pre
was used to derive the original rule ensemble,
newdata = "mean.mi"
can be specified. This
will result in an average dataset being computed over the imputed datasets,
which are then used to compute partial dependence functions. This greatly
reduces the number of observations and thereby computation time.
If none of the variables specified with argument varnames
was
selected for the final prediction rule ensemble, an error will be returned.
Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916-954.
Milborrow, S. (2019). plotmo: Plot a model's residuals, response, and partial dependence plots. https://CRAN.R-project.org/package=plotmo
pre
, singleplot
airq <- airquality[complete.cases(airquality),]
set.seed(42)
airq.ens <- pre(Ozone ~ ., data = airq)
pairplot(airq.ens, c("Temp", "Wind"))
## For multinomial and mgaussian families, one PDP is created per category or outcome
set.seed(42)
airq.ens3 <- pre(Ozone + Wind ~ ., data = airq, family = "mgaussian")
pairplot(airq.ens3, varnames = c("Day", "Month"))
set.seed(42)
iris.ens <- pre(Species ~ ., data = iris, family = "multinomial")
pairplot(iris.ens, varname = c("Petal.Width", "Petal.Length"))
Run the code above in your browser using DataLab