light_ice: Individual Conditional Expectation (ICE)

Description

Generates Individual Conditional Expectation (ICE) profiles. An ICE profile shows how the prediction of an observation changes if one or multiple variables are systematically changed across its ranges, holding all other values fixed [1]. The curves can be centered in order to increase visibility of interaction effects. Centering is done within subgroups specified by "by".

Usage

light_ice(x, ...)
# S3 method for default
light_ice(x, ...)
# S3 method for flashlight
light_ice(x, v = NULL, data = x$data, by = x$by,
  evaluate_at = NULL, breaks = NULL, grid = NULL, n_bins = 27,
  cut_type = c("equal", "quantile"), indices = NULL, n_max = 20,
  seed = NULL, use_linkinv = TRUE, center = FALSE,
  value_name = "value", label_name = "label", id_name = "id", ...)
# S3 method for multiflashlight
light_ice(x, ...)

Arguments

An object of class flashlight or multiflashlight.

...

Further arguments passed to or from other methods.

The variable to be profiled.

data

An optional data.frame.

An optional vector of column names used to additionally group the results.

evaluate_at

Vector with values of v used to evaluate the profile.

breaks

Instead of evaluate_at (and grid), cut points for x can be provided. From them, evaluate_at values are calculates as averages.

grid

A data.frame with grid values as those generated by expand.grid.

n_bins

Maximum number of unique values to evaluate for numeric v. Only used in neither grid nor evaluate_at is specified.

cut_type

For the default "equal", bins of equal width are created for v by pretty. Choose "quantile" to create quantile bins. Only used in neither grid nor evaluate_at is specified.

indices

A vector of row numbers to consider.

n_max

If indices is not given, maximum number of rows to consider. Will be randomly picked from data if necessary.

seed

An integer random seed.

use_linkinv

Should retransformation function be applied? Default is TRUE.

center

Should curves be centered? Default is FALSE. Note that centering will be done at the first evaluation point and within "by" group. It will work also for a grid with multiple columns.

value_name

Column name in resulting data containing the profile value. Defaults to "value".

label_name

Column name in resulting data containing the label of the flashlight. Defaults to "label".

id_name

Column name in resulting data containing the row id of the profile. Defaults to "id_name".

Value

An object of class light_ice, light (and a list) with the following elements.

data A tibble containing the results. Can be used to build fully customized visualizations. Its column names are specified by all other items in this list.
by Same as input by.
v The variable(s) evaluated. @item center Flag if ICE curves are centered.
value_name Same as input value_name.
label_name Same as input label_name.
id_name Same as input id_name.

Methods (by class)

default: Default method not implemented yet.
flashlight: ICE profiles for a flashlight object.
multiflashlight: ICE profiles for a multiflashlight object.

Details

There are two ways to specify the variable(s) to be profiled. The first option is to pass the variable name via v and an optional vector with evaluation points evaluate_at (or breaks). This works for dependence on a single variable. The second option is much more general: You can specify any grid as a data.frame with one or more columns. It can e.g. be generated by a call to expand.grid. Currently, there is no option to pass more than one variable name without such grid. The minimum required elements in the (multi-)flashlight are "predict_function", "model", "linkinv" and "data", where the latest can be passed on the fly. Which rows in data are profiled? This is specified by indices. If not given and n_max is smaller than the number of rows in data, then row indices will be sampled randomly from data. If the same rows should be used for all flashlights in a multiflashlight, there are two options: Either pass a seed (with potentially undesired consequences for subsequent code) or a vector of indices used to select rows. In both cases, data should be the same for all flashlights considered.

References

[1] Goldstein, A. et al. (2015). Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 24:1 <doi.org/10.1080/10618600.2014.907095>.

Examples

Run this code

# NOT RUN {
fit_full <- lm(Sepal.Length ~ ., data = iris)
fit_part <- lm(Sepal.Length ~ Petal.Length, data = iris)
mod_full <- flashlight(model = fit_full, label = "full", data = iris, y = "Sepal.Length")
mod_part <- flashlight(model = fit_part, label = "part", data = iris, y = "Sepal.Length")
mods <- multiflashlight(list(mod_full, mod_part))
grid <- expand.grid(Species = levels(iris$Species), Petal.Length = 2:4)
light_ice(mod_full, v = "Species")
light_ice(mod_full, v = "Species", indices = (1:15) * 10)
light_ice(mod_full, v = "Species", evaluate_at = levels(iris$Species))
light_ice(mod_full, grid = grid, data = iris[1,])$data
light_ice(mods, v = "Species", indices = (1:15) * 10)
light_ice(mods, v = "Species", indices = (1:15) * 10, center = TRUE)
light_ice(mods, v = "Petal.Width", n_bins = 5)
light_ice(mods, v = "Petal.Width", by = "Species", n_bins = 5)
light_ice(mods, v = "Petal.Width", by = "Species",
  id_name = "profile", value_name = "val", label_name = "lab")
# }

Run the code above in your browser using DataLab