Learn R Programming

scorecard (version 0.1.0)

woebin: WOE Binning

Description

woebin generates optimal binning for both numerical and categorical variables using tree-like segmentation. woebin can also customizing breakpoints for both numerical and categorical variables.

Usage

woebin(dt, y, x = NA, breaks_list = NA, min_perc_total = 0.02,
  stop_limit = 0.1, positive = "bad|1", print_step = FALSE)

Arguments

dt

A data frame with both x (predictor/feature) and y (response/label) variables.

y

Name of y variable.

x

Name vector of x variables. Default NA. If x is NA, all variables exclude y will counted as x variables.

breaks_list

List of break points, defaults NA. If it is not NA, variable binning will based on the provided breaks.

min_perc_total

The share of initial binning class number over total. Accepted range: 0.01-0.2; default 0.02.

stop_limit

Stop binning segmentation when information value gain ratio less than the stop_limit. Accepted range: 0-0.5; default 0.1.

positive

Value of positive class, default "bad|1".

print_step

Logical. If it is TRUE, print the variable name when generate binning.

Value

Information of optimal or customized binning

See Also

woebin_ply, woebin_plot

Examples

Run this code
# NOT RUN {
# load germancredit data
data(germancredit)

dt <- germancredit[, c("creditability", "credit.amount", "purpose")]

bins <- woebin(dt, y = "creditability")

# }
# NOT RUN {
# binning for germancredit dataset
bins_germ <- woebin(germancredit, y = "creditability")

# subset dataset
dt2 <- germancredit[, c("creditability", "age.in.years",
      "credit.amount", "housing", "purpose")]

# customizing stop_limit (infovalue grain ratio) for each x variable
bins_cus_sl <- woebin(dt2, y="creditability", stop_limit=c(0.05,0.1,0.01,0.1))


# customizing the breakpoints of binning
breaks_list <- list(
  age.in.years = c(25, 35, 40, 60),
  credit.amount = NULL,
  housing = c("own", "for free%,%rent"),
  purpose = NULL
)

bins_cus_brk <- woebin(dt2, y="creditability", breaks_list=breaks_list)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab