Learn R Programming

scorecard (version 0.1.5)

woebin: WOE Binning

Description

woebin generates optimal binning using tree-like segmentation for numerical, factor and categorical variables. woebin can also customizing breakpoints if the breaks_list was provided.

Usage

woebin(dt, y, x = NULL, breaks_list = NULL, min_perc_total = 0.02,
  stop_limit = 0.1, max_bin_num = 5, positive = "bad|1",
  print_step = 1L)

Arguments

dt

A data frame with both x (predictor/feature) and y (response/label) variables.

y

Name of y variable.

x

Name of x variables. Default NULL If x is NULL, all variables exclude y will counted as x variables.

breaks_list

List of break points, defaults NULL If it is not NULL, variable binning will based on the provided breaks.

min_perc_total

The share of initial binning class number over total. Accepted range: 0.01-0.2; default 0.02.

stop_limit

Stop binning segmentation when information value gain ratio less than the stop_limit. Accepted range: 0-0.5; default 0.1.

max_bin_num

Integer. The maximum binning number.

positive

Value of positive class, default "bad|1".

print_step

A non-negative integer. Default is 1. Print variable names by print_step when print_step>0. If print_step=0, no message is printed.

Value

Optimal or customized binning information

See Also

woebin_ply, woebin_plot, woebin_adj

Examples

Run this code
# NOT RUN {
# load germancredit data
data(germancredit)

# Example I
# binning for two variables in germancredit dataset
bins_2var = woebin(germancredit, y = "creditability", x = c("credit.amount", "purpose"))

# }
# NOT RUN {
# Example II
# binning for germancredit dataset
bins_germ = woebin(germancredit, y = "creditability")
# converting bins_germ into a dataframe
# bins_germ_df = data.table::rbindlist(bins_germ)

# Example III
# customizing stop_limit (info-value grain ratio) for each variable
bins_cus_sl = woebin(germancredit, y="creditability",
  x=c("age.in.years", "credit.amount", "housing", "purpose"),
  stop_limit=c(0.05,0.1,0.01,0.1))

# Example IV
# customizing the breakpoints of binning
breaks_list = list(
  age.in.years = c(25, 35, 40, 60),
  credit.amount = NULL,
  housing = c("own", "for free%,%rent"),
  purpose = NULL
)

bins_cus_brk = woebin(germancredit, y="creditability",
  x=c("age.in.years", "credit.amount", "housing", "purpose"),
  breaks_list=breaks_list)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab