Learn R Programming

scorecard (version 0.1.9)

woebin: WOE Binning

Description

woebin generates optimal binning for numerical, factor and categorical variables using methods including tree-like segmentation or chi-square merge. woebin can also customizing breakpoints if the breaks_list was provided.

Usage

woebin(dt, y, x = NULL, breaks_list = NULL, special_values = NULL,
  min_perc_fine_bin = 0.02, min_perc_coarse_bin = 0.05,
  stop_limit = 0.1, max_num_bin = 8, positive = "bad|1",
  no_cores = NULL, print_step = 0L, method = "tree")

Arguments

dt

A data frame with both x (predictor/feature) and y (response/label) variables.

y

Name of y variable.

x

Name of x variables. Default is NULL. If x is NULL, then all variables except y are counted as x variables.

breaks_list

List of break points, default is NULL. If it is not NULL, variable binning will based on the provided breaks.

special_values

the values specified in special_values will be in separate bins. Default is NULL.

min_perc_fine_bin

The minimum percentage of initial binning class number over total. Accepted range: 0.01-0.2; default is 0.02, which means initial binning into 50 fine bins for continuous variables.

min_perc_coarse_bin

The minimum percentage of final binning class number over total. Accepted range: 0.01-0.2; default is 0.05.

stop_limit

Stop binning segmentation when information value gain ratio less than the stop_limit, or stop binning merge when the minimum of chi-square less than 'qchisq(1-stoplimit, 1)'. Accepted range: 0-0.5; default is 0.1.

max_num_bin

Integer. The maximum number of binning.

positive

Value of positive class, default "bad|1".

no_cores

Number of CPU cores for parallel computation. Defaults NULL. If no_cores is NULL, the no_cores will set as 1 if length of x variables less than 10, and will set as the number of all CPU cores if the length of x variables greater than or equal to 10.

print_step

A non-negative integer. Default is 1. If print_step>0, print variable names by each print_step-th iteration. If print_step=0 or no_cores>1, no message is print.

method

Optimal binning method, it should be "tree" or "chimerge". Default is "tree".

Value

Optimal or customized binning information.

See Also

woebin_ply, woebin_plot, woebin_adj

Examples

Run this code
# NOT RUN {
# load germancredit data
data(germancredit)

# Example I
# binning of two variables in germancredit dataset
# using tree method
bins2_tree = woebin(germancredit, y="creditability",
   x=c("credit.amount","housing"), method="tree")
bins2_tree


# }
# NOT RUN {
# using chimerge method
bins2_chi = woebin(germancredit, y="creditability",
   x=c("credit.amount","housing"), method="chimerge")

# Example II
# binning of the germancredit dataset
bins_germ = woebin(germancredit, y = "creditability")
# converting bins_germ into a dataframe
# bins_germ_df = data.table::rbindlist(bins_germ)

# Example III
# customizing the breakpoints of binning
library(data.table)
dat = rbind(
  germancredit,
  data.table(creditability=sample(c("good","bad"),10,replace=TRUE)),
  fill=TRUE)

breaks_list = list(
  age.in.years = c(26, 35, 37, "Inf%,%missing"),
  housing = c("own", "for free%,%rent")
)

special_values = list(
  credit.amount = c(2600, 9960, "6850%,%missing"),
  purpose = c("education", "others%,%missing")
)

bins_cus_brk = woebin(dat, y="creditability",
  x=c("age.in.years","credit.amount","housing","purpose"),
  breaks_list=breaks_list, special_values=special_values)

# }
# NOT RUN {
# }

Run the code above in your browser using DataLab