binning_by: Optimal Binning for Scoring Modeling

Description

The binning_by() finding class intervals for numerical variable using optical binning. Optimal binning categorizes a numeric characteristic into bins for ulterior usage in scoring modeling.

Usage

binning_by(df, y, x, p = 0.05, ordered = TRUE, labels = NULL)

Arguments

a data frame.

binary response variable (0,1). Integer(int) is required. Name of y must not have a dot. Name "default" is not allowed.

continuous characteristic. At least 5 different values. Value Inf is not allowed. Name of x must not have a dot.

percentage of records per bin. Default 5% (0.05). This parameter only accepts values greater that 0.00 (0%) and lower than 0.50 (50%).

ordered

whether to build an ordered factor or not.

labels

the label names to use for each of the bins.

Value

an object of optimal_bins class. Attributes of optimal_bins class is as follows.

type : binning type, "optimal".
breaks : the number of intervals into which x is to be cut.
levels : levels of binned value.
raw : raw data, x argument value.
ivtable : information value table
iv : information value
flag : information value

"optimal_bins" class attributes information

Attributes of the "optimal_bins" classs that is as follows.

class : "optimal_bins".
levels : factor or ordered factor levels
type : binning method
breaks : breaks for binning
raw : before the binned the raw data
ivtable : information value table
iv : information value
target : binary response variable

See vignette("transformation") for an introduction to these concepts.

Details

This function is useful when used with the mutate/transmute function of the dplyr package. And this function is implemented using smbinning() function of smbinning package.

Examples

Run this code

# NOT RUN {
# Generate data for the example
carseats <- ISLR::Carseats
carseats[sample(seq(NROW(carseats)), 20), "Income"] <- NA
carseats[sample(seq(NROW(carseats)), 5), "Urban"] <- NA

# optimal binning
bin <- binning_by(carseats, "US", "Advertising")
bin
# summary optimal_bins class
summary(bin)
# visualize optimal_bins class
plot(bin, sub = "bins of Advertising variable")
# }

Run the code above in your browser using DataLab