woe_binning: Weight of Evidence based segmentation of a variable

Description

Create heterogeneous segmentations of a numeric variable based on a dependent variable using Weight of Evidence approach

Usage

woe_binning(df, variable, dv, min_perc = 0.02, initial_bins = 50,
  woe_cutoff = 0.1)

Arguments

A data frame containing input arguments - variable & dv

variable

character string specifying the column name of the variable you want to bin. Currently, the code supports only numeric and integer classes

character string specifying the column name of the binary dependent variable(0,1) (NAs are ignored). Dependent variable should be either of integer or numeric class

min_perc

Minimum percentage of records in each segment. If the percentage of records in a segment falls below this threshold it is merged with other segments. Acceptable values are in the range 0.01-0.2

initial_bins

No of segments of the variable to be created in the 1st iteration. Default value = 50(2 percent) for sample size > 1500. Acceptable values are in the range 5-100

woe_cutoff

Thereshold of the absolute difference in woe values between consecutive segments. If the difference is less than this threshold segments are merged. Acceptable values are in the range 0-0.2

Value

Output is a list containing the following elements : a) variable - value of the input argument 'variable' b) dv - value of the input argument 'dv' c) breaks - vector specifying cut-off values for each segment. Pass it to 'breaks' argument of cut function to create segments of the variable d) woe - woe table for the final iteration e) IV - Information Value for the final iteration

Details

Weight of Evidence represents the natural log of the ratio of percent of 0's in the segment to percent of 1's in the segment. It is a proxy for how far the dv rate for a segment is from the sample dv rate (# of 1s/# of observations).

Examples

Run this code

# NOT RUN {
library(smbinning)
woe_binning(smbsimdf1, "cbs1", "fgood", initial_bins = 10)

# }

Run the code above in your browser using DataLab