Learn R Programming

ggRandomForests (version 3.1.2)

gg_partial: Split partial dependence data into continuous or categorical datasets

Description

A partial dependence curve answers a what-if question about a forest: hold every other predictor at its observed value, sweep one of them across its range, and watch how the ensemble prediction moves. Marginalized over the joint distribution of the other variables, the resulting curve isolates the average effect of the swept predictor alone.

Usage

gg_partial(part_dta, nvars = NULL, cat_limit = 10, model = NULL)

Value

A named list with two elements:

continuous

data.frame with columns x, yhat, name (and optionally model) for continuous variables

categorical

data.frame with the same columns but with x as a factor, for low-cardinality / categorical variables

Arguments

part_dta

partial plot data from rfsrc::plot.variable

nvars

how many of the partial plot variables to calculate

cat_limit

Categorical features are built when there are fewer than cat_limit unique feature values.

model

a label name applied to all features. Useful when combining multiple partial plot objects in figures.

Details

gg_partial handles the bookkeeping step after you've already called rfsrc::plot.variable(partial = TRUE): it takes the list that function returns and separates the variables into two tidy data frames -- one for continuous predictors (plotted as lines) and one for categorical predictors (plotted as bar charts). The split is controlled by cat_limit: variables with more unique x-values than this threshold are treated as continuous; all others are categorical.

If you'd rather skip the plot.variable step and pass the fitted forest directly, see gg_partial_rfsrc, which calls partial.rfsrc for you.

See Also

gg_partial_rfsrc gg_partialpro

Examples

Run this code
## Build a small regression forest on the airquality dataset
set.seed(42)
airq <- na.omit(airquality)
rf <- randomForestSRC::rfsrc(Ozone ~ ., data = airq, ntree = 50)

## Compute partial dependence via plot.variable (show.plots = FALSE to
## suppress the base-graphics output, we only want the data)
pv <- randomForestSRC::plot.variable(rf, partial = TRUE,
                                      show.plots = FALSE)

## Split into continuous and categorical data frames
result <- gg_partial(pv)
head(result$continuous)

## Label this model for later comparison with a second forest
result_labelled <- gg_partial(pv, model = "airq_model")
unique(result_labelled$continuous$model)

Run the code above in your browser using DataLab