Black-box models may have very different structures. This function creates a unified representation of a model, which can be further processed by functions for explanations.

```
explain.default(
model,
data = NULL,
y = NULL,
predict_function = NULL,
predict_function_target_column = NULL,
residual_function = NULL,
weights = NULL,
...,
label = NULL,
verbose = TRUE,
precalculate = TRUE,
colorize = TRUE,
model_info = NULL,
type = NULL
)
```explain(
model,
data = NULL,
y = NULL,
predict_function = NULL,
predict_function_target_column = NULL,
residual_function = NULL,
weights = NULL,
...,
label = NULL,
verbose = TRUE,
precalculate = TRUE,
colorize = TRUE,
model_info = NULL,
type = NULL
)

model

object - a model to be explained

data

data.frame or matrix - data which will be used to calculate the explanations. If not provided then will be extracted from the model. Data should be passed without target column (this shall be provided as the `y`

argument). NOTE: If target variable is present in the `data`

, some of the functionalities my not work properly.

y

numeric vector with outputs / scores. If provided then it shall have the same size as `data`

predict_function

function that takes two arguments: model and new data and returns numeric vector with predictions. By default it is `yhat`

.

predict_function_target_column

Character or numeric containing either column name or column number in the model prediction object of the class that should be considered as positive (ie. the class that is associated with probability 1). If NULL, the second column of the output will be taken for binary classification. For a multiclass classification setting that parameter cause switch to binary classification mode with 1 vs others probabilities.

residual_function

function that takes four arguments: model, data, target vector y and predict function (optionally). It should return a numeric vector with model residuals for given data. If not provided, response residuals (\(y-\hat{y}\)) are calculated. By default it is `residual_function_default`

.

weights

numeric vector with sampling weights. By default it's `NULL`

. If provided then it shall have the same length as `data`

...

other parameters

label

character - the name of the model. By default it's extracted from the 'class' attribute of the model

verbose

logical. If TRUE (default) then diagnostic messages will be printed

precalculate

logical. If TRUE (default) then `predicted_values`

and `residual`

are calculated when explainer is created.
This will happen also if `verbose`

is TRUE. Set both `verbose`

and `precalculate`

to FALSE to omit calculations.

colorize

logical. If TRUE (default) then `WARNINGS`

, `ERRORS`

and `NOTES`

are colorized. Will work only in the R console.

model_info

a named list (`package`

, `version`

, `type`

) containg information about model. If `NULL`

, `DALEX`

will seek for information on it's own.

type

type of a model, either `classification`

or `regression`

. If not specified then `type`

will be extracted from `model_info`

.

An object of the class `explainer`

.

It's a list with following fields:

`model`

the explained model.`data`

the dataset used for training.`y`

response for observations from`data`

.`weights`

sample weights for`data`

.`NULL`

if weights are not specified.`y_hat`

calculated predictions.`residuals`

calculated residuals.`predict_function`

function that may be used for model predictions, shall return a single numerical value for each observation.`residual_function`

function that returns residuals, shall return a single numerical value for each observation.`class`

class/classes of a model.`label`

label of explainer.`model_info`

named list contating basic information about model, like package, version of package and type.

Please NOTE, that the `model`

is the only required argument.
But some explanations may expect that other arguments will be provided too.

Explanatory Model Analysis. Explore, Explain and Examine Predictive Models. https://ema.drwhy.ai/

# NOT RUN { # simple explainer for regression problem aps_lm_model4 <- lm(m2.price ~., data = apartments) aps_lm_explainer4 <- explain(aps_lm_model4, data = apartments, label = "model_4v") aps_lm_explainer4 # various parameters for the explain function # all defaults aps_lm <- explain(aps_lm_model4) # silent execution aps_lm <- explain(aps_lm_model4, verbose = FALSE) # set target variable aps_lm <- explain(aps_lm_model4, data = apartments, label = "model_4v", y = apartments$m2.price) aps_lm <- explain(aps_lm_model4, data = apartments, label = "model_4v", y = apartments$m2.price, predict_function = predict) # } # NOT RUN { # user provided predict_function aps_ranger <- ranger::ranger(m2.price~., data = apartments, num.trees = 50) custom_predict <- function(X.model, newdata) { predict(X.model, newdata)$predictions } aps_ranger_exp <- explain(aps_ranger, data = apartments, y = apartments$m2.price, predict_function = custom_predict) # user provided residual_function aps_ranger <- ranger::ranger(m2.price~., data = apartments, num.trees = 50) custom_residual <- function(X.model, newdata, y, predict_function) { abs(y - predict_function(X.model, newdata)) } aps_ranger_exp <- explain(aps_ranger, data = apartments, y = apartments$m2.price, residual_function = custom_residual) # binary classification titanic_ranger <- ranger::ranger(as.factor(survived)~., data = titanic_imputed, num.trees = 50, probability = TRUE) # keep in mind that for binary classification y parameter has to be numeric with 0 and 1 values titanic_ranger_exp <- explain(titanic_ranger, data = titanic_imputed, y = titanic_imputed$survived) # multiclass task hr_ranger <- ranger::ranger(status~., data = HR, num.trees = 50, probability = TRUE) # keep in mind that for multiclass y parameter has to be a factor, # with same levels as in training data hr_ranger_exp <- explain(hr_ranger, data = HR, y = HR$status) # set model_info model_info <- list(package = "stats", ver = "3.6.2", type = "regression") aps_lm_model4 <- lm(m2.price ~., data = apartments) aps_lm_explainer4 <- explain(aps_lm_model4, data = apartments, label = "model_4v", model_info = model_info) # simple function aps_fun <- function(x) 58*x$surface aps_fun_explainer <- explain(aps_fun, data = apartments, y = apartments$m2.price, label="sfun") model_performance(aps_fun_explainer) # set model_info model_info <- list(package = "stats", ver = "3.6.2", type = "regression") aps_lm_model4 <- lm(m2.price ~., data = apartments) aps_lm_explainer4 <- explain(aps_lm_model4, data = apartments, label = "model_4v", model_info = model_info) aps_lm_explainer4 <- explain(aps_lm_model4, data = apartments, label = "model_4v", weights = as.numeric(apartments$construction.year > 2000)) # more complex model library("ranger") aps_ranger_model4 <- ranger(m2.price ~., data = apartments, num.trees = 50) aps_ranger_explainer4 <- explain(aps_ranger_model4, data = apartments, label = "model_ranger") aps_ranger_explainer4 # } # NOT RUN { # }