Create heterogeneous segmentations of a numeric variable based on a dependent variable using Weight of Evidence approach
woe_binning(df, variable, dv, min_perc = 0.02, initial_bins = 50,
woe_cutoff = 0.1)A data frame containing input arguments - variable & dv
character string specifying the column name of the variable you want to bin. Currently, the code supports only numeric and integer classes
character string specifying the column name of the binary dependent variable(0,1) (NAs are ignored). Dependent variable should be either of integer or numeric class
Minimum percentage of records in each segment. If the percentage of records in a segment falls below this threshold it is merged with other segments. Acceptable values are in the range 0.01-0.2
No of segments of the variable to be created in the 1st iteration. Default value = 50(2 percent) for sample size > 1500. Acceptable values are in the range 5-100
Thereshold of the absolute difference in woe values between consecutive segments. If the difference is less than this threshold segments are merged. Acceptable values are in the range 0-0.2
Output is a list containing the following elements : a) variable - value of the input argument 'variable' b) dv - value of the input argument 'dv' c) breaks - vector specifying cut-off values for each segment. Pass it to 'breaks' argument of cut function to create segments of the variable d) woe - woe table for the final iteration e) IV - Information Value for the final iteration
Weight of Evidence represents the natural log of the ratio of percent of 0's in the segment to percent of 1's in the segment. It is a proxy for how far the dv rate for a segment is from the sample dv rate (# of 1s/# of observations).
# NOT RUN {
library(smbinning)
woe_binning(smbsimdf1, "cbs1", "fgood", initial_bins = 10)
# }
Run the code above in your browser using DataLab