Tidy wrapper around varPro::ivarpro() for the regression or
classification family. Returns one row per (observation, variable)
pair where the local-importance cell is non-NA; classification adds
a class column. which_obs collapses to a per-observation
profile; which_class collapses to a single class. Optional
ivarpro_fit argument lets callers cache the expensive
ivarpro() call.
gg_ivarpro(
object,
...,
which_obs = NULL,
which_class = NULL,
cutoff = NULL,
ivarpro_fit = NULL
)A data.frame of class c("gg_ivarpro", "data.frame").
Regression: columns obs / variable / local_imp / selected.
Classification: long-format with an extra class column.
variable is a factor whose levels are set by
mean(|local_imp|) descending across all rows (the unified
ranking axis shared across facets / panels).
A varpro fit from varPro::varpro() (regression or
classification family).
Forwarded to varPro::ivarpro() when ivarpro_fit = NULL;
ignored otherwise (with a warning). Documented forwardables:
adaptive, cut, cut.max, ncut, nmin, nmax, noise.na,
max.rules.tree, max.tree, use.loo, use.abs, scale.
Optional integer scalar - 1-based row index into the
training data. NULL (default) returns the aggregate view.
Optional response level name. NULL default on a
binary classification fit resolves to the last factor level
(positive-class convention). Ignored with a warning on regression
fits.
Selection threshold on |local_imp|. NULL (default)
resolves to the per-class mean(|local_imp|) (or per-frame mean
for regression). A numeric scalar broadcasts. A named numeric
vector (names a subset of class levels) overrides per class with
fallback to the per-class mean for missing names.
Optional pre-computed varPro::ivarpro() result
for the same object. Shape-validated.
The varPro framework builds importance from release rules: for a given
rule region, it compares a local estimator inside that region to what
the estimator becomes after the constraint on the tested variable is
removed ("released"). That contrast is summed over many rules and trees
to get a global z-score: the quantity gg_varpro() shows. What
ivarpro() adds is a per-observation view of the same mechanism.
Concretely: ivarpro() walks the forest's rules and, for each
(observation, variable) pair, computes a scaled per-rule contribution
to predicting that observation. Per-rule LOO removes the observation
from its own rule before scoring, so the contribution is not inflated
by the observation having helped define the region. Per-region scaling
(scale = "local", default) standardises the contribution by the
rule's local response standard deviation so values are comparable
across rules of different size. Aggregating those per-rule scores into
one number per (obs, variable) pair gives the local_imp cell.
No permutation, no synthetic data: the contrast is always between real
subsets of the observed data, defined by the forest's own rules. This
is the same no-synthetic-features property that distinguishes
gg_varpro() from gg_vimp()'s Breiman-Cutler permutation importance.
local_imp[i, v] is the scaled aggregated rule contribution of
variable v to predicting observation i, NOT a permutation
importance and NOT a SHAP value. Sign carries direction of the
local response shift inside the rule's region. Magnitude is on
the response scale when scale = "global", or unit-free when
scale = "local" (the default). The matrix is heavily sparse -
an observation contributes only to rules that retain it as OOB; on
real data, per-variable NA fractions of 50-95% are common.
Comparison with gg_varpro() (aggregate split-strength) and
gg_beta_varpro() (per-rule lasso beta) is diagnostic: a variable
that's important globally but has low per-observation contribution
for a specific case is interesting; the inverse - high local but
low global - flags a regime-specific signal.
Long-format tidy frame. Regression has columns obs, variable,
local_imp, selected. Classification adds a class column
(factor in response-level order). variable is a factor whose
levels are set by mean(|local_imp|) descending across all rows;
for classification that aggregate is across all (obs, class) so
every facet / panel shows variables in the same row order. NA
cells are filtered out - the source matrix is sparse, and the
tidy frame only carries the cells where local importance is
defined.
Provenance attribute carries source, family, ntree, cutoff
(named numeric vector - length 1 named "regr" for regression,
length K named with class levels for classification),
cutoff_default, use.loo, scale, n_train, n_obs, n_var,
precomputed, xvar.names, class_levels (classification only),
which_obs, which_class.
Per-observation interpretation ("which variables drive this
prediction?"), variable-selection diagnostics via the aggregate
distribution view, and side-by-side comparison against
gg_varpro() / gg_beta_varpro() to spot variables that matter
locally but not globally (or vice versa).
ivarpro() is the most expensive call in varPro (per-rule
leave-one-out + per-region scaling, often minutes on real data).
Compute it once and reuse:
v <- varPro::varpro(medv ~ ., data = Boston, ntree = 200)
iv <- varPro::ivarpro(v, scale = "local") # expensive, once
gg_aggregate <- gg_ivarpro(v, ivarpro_fit = iv) # cheap
gg_case1 <- gg_ivarpro(v, ivarpro_fit = iv, which_obs = 1L)
Provenance carries precomputed = TRUE when ivarpro_fit was supplied.
For a classification fit, ivarpro() returns a list of K matrices
(one per class) for multi-class, or a flat data.frame for binary
(positive-class importances only - the wrapper normalises this to
a single-element list under the last factor level). The wrapper
stacks per-class frames into a long-format frame with a class
column. which_class = NULL returns all classes (binary defaults
to the last factor level, the positive-class convention used by
glm and gg_roc); which_class = "<name>" filters to a single
class. cutoff polymorphism mirrors gg_beta_varpro() - NULL
is per-class mean(|local_imp|), a scalar broadcasts, a named
numeric vector overrides per class with fallback to that class's
mean.
Byte-for-byte agreement between cached (ivarpro_fit = iv) and
uncached (ivarpro_fit = NULL) outputs requires reusing the same
ivarpro() result. set.seed() alone is not sufficient because
per-rule LOO subsampling can drift across separate calls. Reuse
ivarpro_fit when reproducibility matters.
gg_varpro(), gg_vimp(), gg_beta_varpro(), varPro::ivarpro().
# \donttest{
if (requireNamespace("varPro", quietly = TRUE) &&
requireNamespace("MASS", quietly = TRUE)) {
set.seed(1)
v <- varPro::varpro(medv ~ ., data = MASS::Boston, ntree = 50)
iv <- varPro::ivarpro(v)
gg <- gg_ivarpro(v, ivarpro_fit = iv)
plot(gg)
}
# }
Run the code above in your browser using DataLab