Learn R Programming

dlookr (version 0.3.9)

binning_by: Optimal Binning for Scoring Modeling

Description

The binning_by() finding class intervals for numerical variable using optical binning. Optimal binning categorizes a numeric characteristic into bins for ulterior usage in scoring modeling.

Usage

binning_by(df, y, x, p = 0.05, ordered = TRUE, labels = NULL)

Arguments

df

a data frame.

y

binary response variable (0,1). Integer(int) is required. Name of y must not have a dot. Name "default" is not allowed.

x

continuous characteristic. At least 5 different values. Value Inf is not allowed. Name of x must not have a dot.

p

percentage of records per bin. Default 5% (0.05). This parameter only accepts values greater that 0.00 (0%) and lower than 0.50 (50%).

ordered

whether to build an ordered factor or not.

labels

the label names to use for each of the bins.

Value

an object of optimal_bins class. Attributes of optimal_bins class is as follows.

  • type : binning type, "optimal".

  • breaks : the number of intervals into which x is to be cut.

  • levels : levels of binned value.

  • raw : raw data, x argument value.

  • ivtable : information value table

  • iv : information value

  • flag : information value

"optimal_bins" class attributes information

Attributes of the "optimal_bins" classs that is as follows.

  • class : "optimal_bins".

  • levels : factor or ordered factor levels

  • type : binning method

  • breaks : breaks for binning

  • raw : before the binned the raw data

  • ivtable : information value table

  • iv : information value

  • target : binary response variable

See vignette("transformation") for an introduction to these concepts.

Details

This function is useful when used with the mutate/transmute function of the dplyr package. And this function is implemented using smbinning() function of smbinning package.

See Also

binning, smbinning.

Examples

Run this code
# NOT RUN {
# Generate data for the example
carseats <- ISLR::Carseats
carseats[sample(seq(NROW(carseats)), 20), "Income"] <- NA
carseats[sample(seq(NROW(carseats)), 5), "Urban"] <- NA

# optimal binning
bin <- binning_by(carseats, "US", "Advertising")
bin
# summary optimal_bins class
summary(bin)
# visualize optimal_bins class
plot(bin, sub = "bins of Advertising variable")
# }

Run the code above in your browser using DataLab