Implements a greedy binning algorithm with monotonicity enforcement and majority-vote direction detection. Important Note: Despite the "Likelihood Ratio Pre-binning" designation in the name, the current implementation uses equal-frequency pre-binning without likelihood ratio statistics. The algorithm is functionally a variant of Monotonic Optimal Binning (MOB) with minor differences in merge strategies.
This method is suitable for credit scoring applications requiring monotonic WoE patterns, but users should be aware that it does not employ the statistical rigor implied by "Likelihood Ratio" in the name.
ob_numerical_mrblp(
feature,
target,
min_bins = 3,
max_bins = 5,
bin_cutoff = 0.05,
max_n_prebins = 20,
convergence_threshold = 1e-06,
max_iterations = 1000,
laplace_smoothing = 0.5
)A list containing:
Integer vector of bin identifiers (1-based indexing).
Character vector of bin intervals in the format "[lower;upper)".
Numeric vector of Weight of Evidence values. Guaranteed to be monotonic.
Numeric vector of Information Value contributions per bin.
Integer vector of total observations per bin.
Integer vector of positive class counts per bin.
Integer vector of negative class counts per bin.
Numeric vector of event rates per bin.
Numeric vector of bin boundaries (excluding -Inf and +Inf).
Total Information Value (sum of bin IVs).
Logical flag indicating convergence within max_iterations.
Integer count of iterations performed.
Numeric vector of feature values to be binned. Missing values (NA) and infinite values are not permitted and will trigger an error (unlike other binning methods that issue warnings).
Integer vector of binary target values (must contain only 0 and 1).
Must have the same length as feature.
Minimum number of bins to generate (default: 3). Must be at least 1. Acts as a hard constraint during monotonicity enforcement.
Maximum number of bins to generate (default: 5). Must be greater
than or equal to min_bins.
Minimum fraction of total observations required in each bin (default: 0.05). Bins with frequency below this threshold are merged. Must be in the range (0, 1).
Maximum number of pre-bins before optimization (default: 20).
Must be at least equal to min_bins.
Convergence threshold (default: 1e-6). Currently used
to check if WoE range is below threshold; primary stopping criterion is
max_iterations.
Maximum number of iterations for bin merging and monotonicity enforcement (default: 1000). Prevents infinite loops.
Laplace smoothing parameter for WoE calculation (default: 0.5). Must be non-negative.
Lopes, J. E.
Algorithm Overview
The MRBLP algorithm executes in five phases:
Phase 1: Equal-Frequency Pre-binning
Initial bins are created by dividing sorted data into approximately equal-sized groups:
$$n_{\text{bin}} = \max\left(1, \left\lfloor \frac{N}{\text{max\_n\_prebins}} \right\rfloor\right)$$
Note: Despite "Likelihood Ratio Pre-binning" in the name, no likelihood ratio statistics are computed. A true likelihood ratio approach would compute:
$$\text{LR}(c) = \prod_{x \le c} \frac{P(x|y=1)}{P(x|y=0)} \times \prod_{x > c} \frac{P(x|y=1)}{P(x|y=0)}$$
and select cutpoints \(c\) that maximize \(|\log \text{LR}(c)|\). This is not implemented in the current version.
Phase 2: Rare Bin Merging
Bins with total count below bin_cutoff \(\times N\) are merged. The
merge direction (left or right) is chosen to minimize IV loss:
$$\text{direction} = \arg\min_{d \in \{\text{left}, \text{right}\}} \left( \text{IV}_i + \text{IV}_{i+d} - \text{IV}_{\text{merged}} \right)$$
Phase 3: Initial WoE/IV Calculation
Weight of Evidence for bin \(i\):
$$\text{WoE}_i = \ln\left(\frac{n_i^{+} + \alpha}{n^{+} + k\alpha} \bigg/ \frac{n_i^{-} + \alpha}{n^{-} + k\alpha}\right)$$
where \(\alpha = \text{laplace\_smoothing}\) and \(k\) is the number of bins.
Phase 4: Monotonicity Enforcement
The algorithm determines the desired monotonicity direction via majority vote:
$$\text{increasing} = \begin{cases} \text{TRUE} & \text{if } \#\{\text{WoE}_i > \text{WoE}_{i-1}\} \ge \#\{\text{WoE}_i < \text{WoE}_{i-1}\} \\ \text{FALSE} & \text{otherwise} \end{cases}$$
This differs from:
MOB: Uses first two bins only (WoE[1] >= WoE[0])
MBLP: Uses Pearson correlation between bin indices and WoE
Violations are detected as: $$\text{violation} = \begin{cases} \text{WoE}_i < \text{WoE}_{i-1} & \text{if increasing} \\ \text{WoE}_i > \text{WoE}_{i-1} & \text{if decreasing} \end{cases}$$
Violating bins are merged iteratively until monotonicity is achieved or
min_bins is reached.
Phase 5: Bin Count Reduction
If the number of bins exceeds max_bins, the algorithm merges bins with
the smallest absolute IV difference:
$$\text{merge\_idx} = \arg\min_{i=0}^{k-2} |\text{IV}_i - \text{IV}_{i+1}|$$
Critique: This criterion assumes bins with similar IVs are redundant, which is not theoretically justified. A more rigorous approach (used in MBLP) minimizes IV loss after merge:
$$\Delta \text{IV} = \text{IV}_i + \text{IV}_{i+1} - \text{IV}_{\text{merged}}$$
Theoretical Foundations
Monotonicity Enforcement: Based on Zeng (2014), ensuring stability under data distribution shifts.
Likelihood Ratio (Theoretical): Neyman-Pearson lemma establishes likelihood ratio as the optimal test statistic for hypothesis testing. For binning, cutpoints maximizing LR would theoretically yield optimal class separation. However, this is not implemented.
Practical Equivalence: The algorithm is functionally equivalent to MOB with minor differences in direction detection and merge strategies.
Comparison with Related Methods
| Method | Pre-binning | Direction Detection | Merge Criterion |
| MRBLP | Equal-frequency | Majority vote | Min IV difference |
| MOB | Equal-frequency | First two bins | Min IV loss |
| MBLP | Quantile-based | Pearson correlation | Min IV loss |
| MDLP | Equal-frequency | N/A (optional) | MDL cost |
Computational Complexity
Identical to MOB: \(O(N \log N + k^2 \times \text{max\_iterations})\)
When to Use MRBLP vs Alternatives
Use MRBLP: If you specifically need majority-vote direction detection and can tolerate the non-standard merge criterion.
Use MOB: For simplicity and slightly faster direction detection.
Use MBLP: For more robust direction detection via correlation.
Use MDLP: For information-theoretic optimality without mandatory monotonicity.
Neyman, J., & Pearson, E. S. (1933). "On the Problem of the Most Efficient Tests of Statistical Hypotheses". Philosophical Transactions of the Royal Society A, 231(694-706), 289-337. [Theoretical foundation for likelihood ratio, not implemented in code]
Mironchyk, P., & Tchistiakov, V. (2017). "Monotone optimal binning algorithm for credit risk modeling". Frontiers in Applied Mathematics and Statistics, 3, 2.
Zeng, G. (2014). "A Necessary Condition for a Good Binning Algorithm in Credit Scoring". Applied Mathematical Sciences, 8(65), 3229-3242.
Siddiqi, N. (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. Wiley.
Anderson, R. (2007). The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation. Oxford University Press.
Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression (3rd ed.). Wiley.
ob_numerical_mob for the base monotonic binning algorithm,
ob_numerical_mblp for correlation-based direction detection,
ob_numerical_mdlp for information-theoretic binning.
# \donttest{
# Simulate credit scoring data
set.seed(2024)
n <- 10000
feature <- c(
rnorm(4000, mean = 620, sd = 50),
rnorm(4000, mean = 690, sd = 45),
rnorm(2000, mean = 740, sd = 35)
)
target <- c(
rbinom(4000, 1, 0.20),
rbinom(4000, 1, 0.10),
rbinom(2000, 1, 0.04)
)
# Apply MRBLP
result <- ob_numerical_mrblp(
feature = feature,
target = target,
min_bins = 3,
max_bins = 5
)
# Compare with MOB (should be very similar)
result_mob <- ob_numerical_mob(
feature = feature,
target = target,
min_bins = 3,
max_bins = 5
)
# Compare results
data.frame(
Method = c("MRBLP", "MOB"),
N_Bins = c(length(result$woe), length(result_mob$woe)),
Total_IV = c(result$total_iv, result_mob$total_iv),
Iterations = c(result$iterations, result_mob$iterations)
)
# }
Run the code above in your browser using DataLab