This function calculates information value (IV) for multiple x variables. It treats each unique value in x variables as a group. If there is a zero number of y class, it will be replaced by 0.99 to make sure woe/iv is calculable.
Usage
iv(dt, y, x = NULL, positive = "bad|1", order = TRUE)
Arguments
dt
A data frame with both x (predictor/feature) and y (response/label) variables.
y
Name of y variable.
x
Name of x variables. Default is NULL. If x is NULL, then all columns except y are counted as x variables.
positive
Value of positive class, default is "bad|1".
order
Logical, default is TRUE. If it is TRUE, the output will descending order via iv.
Value
A data frame with columns for variable and info_value
Details
IV is a very useful concept for variable selection while developing credit scorecards. The formula for information value is shown below: $$IV = \sum(DistributionBad_{i} - DistributionGood_{i})*\ln(\frac{DistributionBad_{i}}{DistributionGood_{i}}).$$ The log component in information value is defined as weight of evidence (WOE), which is shown as $$WeightofEvidence = \ln(\frac{DistributionBad_{i}}{DistributionGood_{i}}).$$
The relationship between information value and predictive power is as follows: