Learn R Programming

scorecard (version 0.1.4)

woebin: WOE Binning

Description

woebin generates optimal binning for both numerical and categorical variables using tree-like segmentation. For the categorical variables, the binning segmentation will ordered by the levels for factor and by the bad probability for character. woebin can also customizing breakpoints for both numerical and categorical variables.

Usage

woebin(dt, y, x = NULL, breaks_list = NULL, min_perc_total = 0.02,
  stop_limit = 0.1, max_bin_num = 5, positive = "bad|1",
  print_step = 1L)

Arguments

dt

A data frame with both x (predictor/feature) and y (response/label) variables.

y

Name of y variable.

x

Name of x variables. Default NULL If x is NULL, all variables exclude y will counted as x variables.

breaks_list

List of break points, defaults NULL If it is not NULL, variable binning will based on the provided breaks.

min_perc_total

The share of initial binning class number over total. Accepted range: 0.01-0.2; default 0.02.

stop_limit

Stop binning segmentation when information value gain ratio less than the stop_limit. Accepted range: 0-0.5; default 0.1.

max_bin_num

Integer. The maximum binning number.

positive

Value of positive class, default "bad|1".

print_step

A non-negative integer. Default is 1. Print variable names by print_step when print_step>0. If print_step=0, no message is printed.

Value

Optimal or customized binning information

See Also

woebin_ply, woebin_plot, woebin_adj

Examples

Run this code
# NOT RUN {
# load germancredit data
data(germancredit)

# Example I
# binning for two variables in germancredit dataset
bins_2var = woebin(germancredit, y = "creditability", x = c("credit.amount", "purpose"))

# }
# NOT RUN {
# Example II
# binning for germancredit dataset
bins_germ = woebin(germancredit, y = "creditability")
# converting bins_germ into a dataframe
# bins_germ_df = data.table::rbindlist(bins_germ)

# Example III
# customizing stop_limit (info-value grain ratio) for each variable
bins_cus_sl = woebin(germancredit, y="creditability",
  x=c("age.in.years", "credit.amount", "housing", "purpose"),
  stop_limit=c(0.05,0.1,0.01,0.1))

# Example IV
# customizing the breakpoints of binning
breaks_list = list(
  age.in.years = c(25, 35, 40, 60),
  credit.amount = NULL,
  housing = c("own", "for free%,%rent"),
  purpose = NULL
)

bins_cus_brk = woebin(germancredit, y="creditability",
  x=c("age.in.years", "credit.amount", "housing", "purpose"),
  breaks_list=breaks_list)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab