ingredients (version 0.3.1)

cluster_profiles: Cluster Ceteris Paribus Profiles

Description

Function 'cluster_profiles' calculates aggregates of ceteris paribus profiles based on hierarchical clustering.

Usage

cluster_profiles(x, ..., aggregate_function = mean,
  only_numerical = TRUE, center = FALSE, k = 3, variables = NULL)

Arguments

x

a ceteris paribus explainer produced with function `ceteris_paribus()`

...

other explainers that shall be plotted together

aggregate_function

a function for profile aggregation. By default it's 'mean'

only_numerical

a logical. If TRUE then only numerical variables will be plotted. If FALSE then only categorical variables will be plotted.

center

shall profiles be centered before clustering

k

number of clusters for the hclust function

variables

if not NULL then only `variables` will be presented

Value

a 'aggregated_profiles_explainer' layer

Details

Find more detailes in the Clustering Profiles Chapter.

References

Predictive Models: Visual Exploration, Explanation and Debugging https://pbiecek.github.io/PM_VEE

Examples

Run this code
# NOT RUN {
library("DALEX")
titanic <- na.omit(titanic)
selected_passangers <- select_sample(titanic, n = 100)
model_titanic_glm <- glm(survived == "yes" ~ gender + age + fare,
                       data = titanic, family = "binomial")

explain_titanic_glm <- explain(model_titanic_glm,
                           data = titanic[,-9],
                           y = titanic$survived == "yes")
cp_rf <- ceteris_paribus(explain_titanic_glm, selected_passangers)
clust_rf <- cluster_profiles(cp_rf, k = 3, variables = "age")
plot(clust_rf)

# }
# NOT RUN {
 library("randomForest")
 model_titanic_rf <- randomForest(survived == "yes" ~ gender + age + class + embarked +
                                    fare + sibsp + parch,  data = titanic)
 model_titanic_rf

 explain_titanic_rf <- explain(model_titanic_rf,
                           data = titanic[,-9],
                           y = titanic$survived == "yes",
                           label = "Random Forest v7")

cp_rf <- ceteris_paribus(explain_titanic_rf, selected_passangers)
cp_rf

pdp_rf <- aggregate_profiles(cp_rf, variables = "age")
head(pdp_rf)
clust_rf <- cluster_profiles(cp_rf, k = 3, variables = "age")
head(clust_rf)

plot(clust_rf, color = "_label_") +
  show_aggreagated_profiles(pdp_rf, color = "black", size = 3)

plot(cp_rf, color = "grey", variables = "age") +
  show_aggreagated_profiles(clust_rf, color = "_label_", size = 2)

clust_rf <- cluster_profiles(cp_rf, k = 3, center = TRUE, variables = "age")
head(clust_rf)
# }

Run the code above in your browser using DataLab