Algorithm Overview
The Monotonic Binning via Linear Programming (MBLP) algorithm operates in four
sequential phases designed to balance predictive power (IV maximization) and
interpretability (monotonic WoE):
Phase 1: Quantile-Based Pre-binning
Initial bin boundaries are determined using empirical quantiles of the feature
distribution. For \(k\) pre-bins, cutpoints are computed as:
$$q_i = x_{(\lceil p_i \times (N - 1) \rceil)}, \quad p_i = \frac{i}{k}, \quad i = 1, 2, \ldots, k-1$$
where \(x_{(j)}\) denotes the \(j\)-th order statistic. This approach ensures
equal-frequency bins under the assumption of continuous data, though ties may
cause deviations in practice. The first and last boundaries are set to
\(-\infty\) and \(+\infty\), respectively.
Phase 2: Frequency-Based Bin Merging
Bins with total count below bin_cutoff \(\times N\) are iteratively
merged with adjacent bins to ensure statistical reliability. The merge strategy
selects the neighbor with the smallest count (greedy heuristic), continuing
until all bins meet the frequency threshold or min_bins is reached.
Phase 3: Monotonicity Direction Determination
If force_monotonic_direction = 0, the algorithm computes the Pearson
correlation between bin indices and WoE values:
$$\rho = \frac{\sum_{i=1}^{k} (i - \bar{i})(\text{WoE}_i - \overline{\text{WoE}})}{\sqrt{\sum_{i=1}^{k} (i - \bar{i})^2 \sum_{i=1}^{k} (\text{WoE}_i - \overline{\text{WoE}})^2}}$$
The monotonicity direction is set as:
$$\text{direction} = \begin{cases} 1 & \text{if } \rho \ge 0 \text{ (increasing)} \\ -1 & \text{if } \rho < 0 \text{ (decreasing)} \end{cases}$$
If force_monotonic_direction is explicitly set to 1 or -1, that value
overrides the correlation-based determination.
Phase 4: Iterative Optimization Loop
The core optimization alternates between two enforcement steps until convergence:
Cardinality Constraint: If the number of bins \(k\) exceeds
max_bins, the algorithm identifies the pair of adjacent bins
\((i, i+1)\) that minimizes the IV loss when merged:
$$\Delta \text{IV}_{i,i+1} = \text{IV}_i + \text{IV}_{i+1} - \text{IV}_{\text{merged}}$$
where \(\text{IV}_{\text{merged}}\) is recalculated using combined counts.
The merge is performed only if it preserves monotonicity (checked via WoE
comparison with neighboring bins).
Monotonicity Enforcement: For each pair of consecutive bins,
violations are detected as:
where \(\epsilon = 10^{-10}\) (numerical tolerance). Violating bins are
immediately merged.
Convergence Test: After each iteration, the total IV is compared
to the previous iteration. If \(|\text{IV}^{(t)} - \text{IV}^{(t-1)}| < \text{convergence\_threshold}\)
or monotonicity is achieved, the loop terminates.
Weight of Evidence Computation
WoE for bin \(i\) uses Laplace smoothing (\(\alpha = 0.5\)) to handle zero counts:
$$\text{WoE}_i = \ln\left(\frac{\text{DistGood}_i}{\text{DistBad}_i}\right)$$
where:
$$\text{DistGood}_i = \frac{n_i^{+} + \alpha}{n^{+} + k\alpha}, \quad \text{DistBad}_i = \frac{n_i^{-} + \alpha}{n^{-} + k\alpha}$$
and \(k\) is the current number of bins. The Information Value contribution is:
$$\text{IV}_i = (\text{DistGood}_i - \text{DistBad}_i) \times \text{WoE}_i$$
Theoretical Foundations
Monotonicity Requirement: Zeng (2014) proves that monotonic WoE
is a necessary condition for stable scorecards under data drift. Non-monotonic
patterns often indicate overfitting to noise.
Greedy Optimization: Unlike global optimizers (MILP), greedy
heuristics provide no optimality guarantees but achieve O(k²) complexity
per iteration versus exponential for exact methods.
Quantile Binning: Ensures initial bins have approximately equal
sample sizes, reducing variance in WoE estimates (especially critical for
minority classes).
Comparison with True Linear Programming
Formal LP formulations for binning (Belotti et al., 2016) express the problem as:
$$\max_{\mathbf{z}, \mathbf{b}} \sum_{i=1}^{k} \text{IV}_i(\mathbf{b})$$
subject to:
$$\text{WoE}_i \le \text{WoE}_{i+1} \quad \forall i \quad \text{(monotonicity)}$$
$$\sum_{j=1}^{N} z_{ij} = 1 \quad \forall j \quad \text{(assignment)}$$
$$z_{ij} \in \{0, 1\}, \quad b_i \in \mathbb{R}$$
where \(z_{ij}\) indicates if observation \(j\) is in bin \(i\), and
\(b_i\) are bin boundaries. Such formulations require MILP solvers (CPLEX,
Gurobi) and scale poorly beyond \(N > 10^4\). MBLP sacrifices global optimality
for scalability and determinism.
Computational Complexity
Initial sorting: \(O(N \log N)\)
Quantile computation: \(O(k)\)
Per-iteration operations: \(O(k^2)\) (pairwise comparisons for merging)
Total: \(O(N \log N + k^2 \times \text{max\_iterations})\)
For typical credit scoring datasets (\(N \sim 10^5\), \(k \sim 5\)),
runtime is dominated by sorting. Pathological cases (highly non-monotonic data)
may require many iterations to enforce monotonicity.