Train models over subsets selected using infogram
h2o.infogram_train_subset_models(
ig,
model_fun,
training_frame,
test_frame,
y,
protected_columns,
reference,
favorable_class,
feature_selection_metrics = c("safety_index"),
metric = "euclidean",
air_metric = "selectedRatio",
alpha = 0.05,
...
)
frame containing aggregations of intersectional fairness across the models
Infogram object trained with the same protected columns
Function that creates models. This can be something like h2o.automl, h2o.gbm, etc.
Training frame
Test frame
Response column
Protected columns
List of values corresponding to a reference for each protected columns. If set to NULL, it will use the biggest group as the reference.
Positive/favorable outcome class of the response.
One or more columns from the infogram@admissible_score.
Metric supported by stats::dist which is used to sort the features.
Metric used for Adverse Impact Ratio calculation. Defaults to ``selectedRatio``.
The alpha level is the probability of rejecting the null hypothesis that the protected group and the reference came from the same population when the null hypothesis is true.
Parameters that are passed to the model_fun.
if (FALSE) {
library(h2o)
h2o.connect()
data <- h2o.importFile(paste0("https://s3.amazonaws.com/h2o-public-test-data/smalldata/",
"admissibleml_test/taiwan_credit_card_uci.csv"))
x <- c('LIMIT_BAL', 'AGE', 'PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1',
'BILL_AMT2', 'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6', 'PAY_AMT1', 'PAY_AMT2',
'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6')
y <- "default payment next month"
protected_columns <- c('SEX', 'EDUCATION')
for (col in c(y, protected_columns))
data[[col]] <- as.factor(data[[col]])
splits <- h2o.splitFrame(data, 0.8)
train <- splits[[1]]
test <- splits[[2]]
reference <- c(SEX = "1", EDUCATION = "2") # university educated man
favorable_class <- "0" # no default next month
ig <- h2o.infogram(x, y, train, protected_columns = protected_columns)
print(ig@admissible_score)
plot(ig)
infogram_models <- h2o.infogram_train_subset_models(ig, h2o.gbm, train, test, y,
protected_columns, reference,
favorable_class)
pf <- h2o.pareto_front(infogram_models, x_metric = "air_min",
y_metric = "AUC", optimum = "top right")
plot(pf)
pf@pareto_front
}
Run the code above in your browser using DataLab