ICEbox (version 1.1.5)

ice: Creates an object of class ice.

Description

Creates an ice object with individual conditional expectation curves for the passed model object, X matrix, predictor, and response. See Goldstein et al (2013) for further details.

Usage

ice(object, X, y, predictor, predictfcn, verbose = TRUE, frac_to_build = 1, 
             indices_to_build = NULL, num_grid_pts, logodds = FALSE, probit = FALSE, ...)

Value

A list of class ice with the following elements.

gridpts

Sorted values of predictor at which each curve is estimated. Duplicates are removed -- by definition, elements of gridpts are unique.

ice_curves

Matrix of dimension nrow(X) by length(gridpts). Each row corresponds to an observation's ICE curve, estimated at the values of predictor in gridpts.

xj

The actual values of predictor observed in the data in the order of Xice.

actual_predictions

Vector of length nrow(X) containing the model's predictions at the actual value of the predictors in the order of Xice.

xlab

String with the predictor name corresponding to predictor. If predictor is a column number, xlab is set to colnames(X)[, predictor].

nominal_axis

If TRUE, length(gridpts) is 5 or fewer; otherwise FALSE. When TRUE the plot function treats the x-axis as if x is nominal.

range_y

If y was passed, the range of the response. Otherwise it defaults to be
max(ice_curves) - min(ice_curves) and a message is printed to the console.

sd_y

If y was passed, the standard deviation of the response. Otherwise it is defaults to sd(actual_predictions) and a message is printed to the console.

Xice

A matrix containing the subset of X for which ICE curves are estimated. Observations are ordered to be increasing in predictor. This ordering is the same one as in ice_curves, xj and actual_predictions, meaning for all these objects the i-th element refers to the same observation in X.

pdp

A vector of size length(gridpts) which is a numerical approximation to the partial dependence function (PDP) corresponding to the estimated ICE curves. See Goldstein et al (2013) for a discussion of how the PDP is a form of post-processing. See Friedman (2001) for a description of PDPs.

predictor

Same as the argument, see argument description.

logodds

Same as the argument, see argument description.

indices_to_build

Same as the argument, see argument description.

frac_to_build

Same as the argument, see argument description.

predictfcn

Same as the argument, see argument description.

Arguments

object

The fitted model to estimate ICE curves for.

X

The design matrix we wish to estimate ICE curves for. Rows are observations, columns are predictors. Typically this is taken to be object's training data, but this is not strictly necessary.

y

Optional vector of the response values object was trained on. It is used to compute y-axis ranges that are useful for plotting. If not passed, the range of predicted values is used and a warning is printed.

predictor

The column number or variable name in X of the predictor of interest, (\(x_S = X[, j]\)).

predictfcn

Optional function that accepts two arguments, object and newdata, and returns an N vector of object's predicted response for data newdata. If this argument is not passed, the procedure attempts to find a generic predict function corresponding to class(object).

verbose

If TRUE, prints messages about the procedure's progress.

frac_to_build

Number between 0 and 1, with 1 as default. For large X matrices or fitted models that are slow to make predictions, specifying frac_to_build less than 1 will choose a subset of the observations to build curves for. The subset is chosen such that the remaining observations' values of predictor are evenly spaced throughout the quantiles of the full X[,predictor] vector.

indices_to_build

Vector of indices, \(\subset \{1, \ldots, nrow(X)\}\) specifying which observations to build ICE curves for. As this is an alternative to setting frac_to_build, both cannot be specified.

num_grid_pts

Optional number of values in the range of predictor at which to estimate each curve. If missing, the curves are estimated at each unique value of predictor in the X observations we estimate ICE curves for.

logodds

If TRUE, for classification creates PDPs by plotting the centered log-odds implied by the fitted probabilities. We assume that the generic or passed predict function returns probabilities, and so the flag tells us to transform these to centered logits after the predictions are generated. Note: probit cannot be TRUE.

probit

If TRUE, for classification creates PDPs by plotting the probit implied by the fitted probabilities. We assume that the generic or passed predict function returns probabilities, and so the flag tells us to transform these to probits after the predictions are generated. Note: logodds cannot be TRUE.

...

Other arguments to be passed to object's generic predict function.

References

Jerome Friedman. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5): 1189-1232, 2001.

Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation. (2014) Journal of Computational and Graphical Statistics, in press

See Also

plot.ice, print.ice, summary.ice

Examples

Run this code
if (FALSE) {
require(ICEbox)
require(randomForest)
require(MASS) #has Boston Housing data, Pima

########  regression example
data(Boston) #Boston Housing data
X = Boston
y = X$medv
X$medv = NULL

## build a RF:
bhd_rf_mod = randomForest(X, y)

## Create an 'ice' object for the predictor "age":
bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1) 

#### classification example
data(Pima.te)  #Pima Indians diabetes classification
y = Pima.te$type
X = Pima.te
X$type = NULL

## build a RF:
pima_rf_mod = randomForest(x = X, y = y)

## Create an 'ice' object for the predictor "skin":
# For classification we plot the centered log-odds. If we pass a predict
# function that returns fitted probabilities, setting logodds = TRUE instructs
# the function to set each ice curve to the centered log-odds of the fitted 
# probability.
pima.ice = ice(object = pima_rf_mod, X = X, predictor = "skin", logodds = TRUE,
                    predictfcn = function(object, newdata){ 
                         predict(object, newdata, type = "prob")[, 2]
                    }
              )

}

Run the code above in your browser using DataLab