woebin
generates optimal binning for both numerical and categorical variables using tree-like segmentation. woebin
can also customizing breakpoints for both numerical and categorical variables.
woebin(dt, y, x = NULL, breaks_list = NULL, min_perc_total = 0.02,
stop_limit = 0.1, positive = "bad|1", print_step = FALSE)
A data frame with both x (predictor/feature) and y (response/label) variables.
Name of y variable.
Name of x variables. Default NULL If x is NULL, all variables exclude y will counted as x variables.
List of break points, defaults NULL If it is not NULL, variable binning will based on the provided breaks.
The share of initial binning class number over total. Accepted range: 0.01-0.2; default 0.02.
Stop binning segmentation when information value gain ratio less than the stop_limit. Accepted range: 0-0.5; default 0.1.
Value of positive class, default "bad|1".
Logical. If it is TRUE, print the variable name when generate binning.
Optimal or customized binning information
# NOT RUN {
# load germancredit data
data(germancredit)
# Example I
# binning for two variables in germancredit dataset
bins_2var <- woebin(germancredit, y = "creditability", x = c("credit.amount", "purpose"))
# }
# NOT RUN {
# Example II
# binning for germancredit dataset
bins_germ <- woebin(germancredit, y = "creditability")
# Example III
# customizing stop_limit (info-value grain ratio) for each variable
bins_cus_sl <- woebin(germancredit, y="creditability",
x=c("age.in.years", "credit.amount", "housing", "purpose"),
stop_limit=c(0.05,0.1,0.01,0.1))
# Example IV
# customizing the breakpoints of binning
breaks_list <- list(
age.in.years = c(25, 35, 40, 60),
credit.amount = NULL,
housing = c("own", "for free%,%rent"),
purpose = NULL
)
bins_cus_brk <- woebin(germancredit, y="creditability",
x=c("age.in.years", "credit.amount", "housing", "purpose"),
breaks_list=breaks_list)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab