Implements a greedy binning algorithm with quantile-based pre-binning and
monotonicity enforcement. Important Note: Despite "Optimal Supervised
Learning Partitioning" and "LP" in the name, the algorithm uses greedy
heuristics without formal Linear Programming or convex optimization. The method
is functionally equivalent to ob_numerical_mrblp with minor
differences in pre-binning strategy and bin reduction criteria.
Users seeking true optimization-based binning should consider Mixed-Integer
Programming (MIP) implementations (e.g., via ompr or lpSolve
packages), though these scale poorly beyond N > 10,000 observations.
ob_numerical_oslp(
feature,
target,
min_bins = 3,
max_bins = 5,
bin_cutoff = 0.05,
max_n_prebins = 20,
convergence_threshold = 1e-06,
max_iterations = 1000,
laplace_smoothing = 0.5
)A list containing:
Integer bin identifiers (1-based).
Character bin intervals "[lower;upper)".
Numeric WoE values (guaranteed monotonic).
Numeric IV contributions per bin.
Integer total observations per bin.
Integer positive class counts.
Integer negative class counts.
Numeric event rates.
Numeric bin boundaries (excluding ±Inf).
Total Information Value.
Logical convergence flag.
Integer iteration count.
Numeric vector of feature values. Missing values (NA) and infinite values are not permitted and will trigger an error.
Integer or numeric vector of binary target values (must contain only
0 and 1). Must have the same length as feature. Unlike other binning
methods, OSLP internally uses double for target, allowing implicit
conversion from integer.
Minimum number of bins (default: 3). Must be at least 2.
Maximum number of bins (default: 5). Must be greater than or
equal to min_bins.
Minimum fraction of total observations per bin (default: 0.05). Must be in (0, 1).
Maximum number of pre-bins (default: 20). Must be at least
equal to min_bins.
Convergence threshold for IV change (default: 1e-6).
Maximum iterations (default: 1000).
Laplace smoothing parameter (default: 0.5). Must be non-negative.
Lopes, J. E.
Algorithm Overview
OSLP executes in five phases:
Phase 1: Quantile-Based Pre-binning
Unlike equal-frequency methods that ensure balanced bin sizes, OSLP places cutpoints at quantiles of unique feature values:
$$\text{edge}_i = \text{unique\_vals}\left[\left\lfloor p_i \times (n_{\text{unique}} - 1) \right\rfloor\right]$$
where \(p_i = i / \text{max\_n\_prebins}\).
Critique: If unique values are clustered (e.g., many observations at specific values), bins may have vastly different sizes, violating the equal-frequency principle that ensures statistical stability.
Phase 2: Rare Bin Merging
Bins with \(n_i / N < \text{bin\_cutoff}\) are merged. The merge direction minimizes IV loss:
$$\Delta \text{IV} = \text{IV}_i + \text{IV}_{i+d} - \text{IV}_{\text{merged}}$$
where \(d \in \{-1, +1\}\) (left or right neighbor).
Phase 3: Initial WoE/IV Calculation
Standard WoE with Laplace smoothing:
$$\text{WoE}_i = \ln\left(\frac{n_i^{+} + \alpha}{n^{+} + k\alpha} \bigg/ \frac{n_i^{-} + \alpha}{n^{-} + k\alpha}\right)$$
Phase 4: Monotonicity Enforcement
Direction determined via majority vote (identical to MRBLP):
$$\text{increasing} = \begin{cases} \text{TRUE} & \text{if } \sum_i \mathbb{1}_{\{\text{WoE}_i > \text{WoE}_{i-1}\}} \ge \sum_i \mathbb{1}_{\{\text{WoE}_i < \text{WoE}_{i-1}\}} \\ \text{FALSE} & \text{otherwise} \end{cases}$$
Violations are merged iteratively.
Phase 5: Bin Count Reduction
If \(k > \text{max\_bins}\), merge bins with the smallest combined IV:
$$\text{merge\_idx} = \arg\min_{i=0}^{k-2} \left( \text{IV}_i + \text{IV}_{i+1} \right)$$
Rationale: Assumes bins with low total IV contribute least to predictive power. However, this ignores the interaction between bins; a low-IV bin may be essential for monotonicity or preventing gaps.
Theoretical Foundations
Despite the name "Optimal Supervised Learning Partitioning", the algorithm lacks:
Global optimality guarantees: Greedy merging is myopic
Formal loss function: No explicit objective being minimized
LP formulation: No constraint matrix, simplex solver, or dual variables
A true optimal partitioning approach would formulate the problem as:
$$\min_{\mathbf{z}, \mathbf{b}} \left\{ -\sum_{i=1}^{k} \text{IV}_i(\mathbf{b}) + \lambda k \right\}$$
subject to: $$\sum_{j=1}^{k} z_{ij} = 1 \quad \forall i \in \{1, \ldots, N\}$$ $$\text{WoE}_j \le \text{WoE}_{j+1} \quad \forall j$$ $$z_{ij} \in \{0, 1\}, \quad b_j \in \mathbb{R}$$
where \(z_{ij}\) indicates observation \(i\) assigned to bin \(j\), and \(\lambda\) is a complexity penalty. This requires MILP solvers (CPLEX, Gurobi) and is intractable for \(N > 10^4\).
Comparison with Related Methods
| Method | Pre-binning | Direction | Merge (max_bins) | Target Type |
| OSLP | Quantile (unique vals) | Majority vote | Min (IV(i) + IV(i+1)) | double |
| MRBLP | Equal-frequency | Majority vote | Min |IV(i) - IV(i+1)| | int |
| MOB | Equal-frequency | First two bins | Min IV loss | int |
| MBLP | Quantile (data) | Correlation | Min IV loss | int |
When to Use OSLP
Use OSLP: Never. Use MBLP or MOB instead for better pre-binning and merge strategies.
Use MBLP: For robust direction detection via correlation.
Use MDLP: For information-theoretic stopping criteria.
Use True LP: For small datasets (N < 1000) where global optimality is critical and computational cost is acceptable.
Mironchyk, P., & Tchistiakov, V. (2017). "Monotone optimal binning algorithm for credit risk modeling". Frontiers in Applied Mathematics and Statistics, 3, 2.
Zeng, G. (2014). "A Necessary Condition for a Good Binning Algorithm in Credit Scoring". Applied Mathematical Sciences, 8(65), 3229-3242.
Fayyad, U. M., & Irani, K. B. (1993). "Multi-Interval Discretization of Continuous-Valued Attributes". IJCAI, pp. 1022-1027.
Good, I. J. (1952). "Rational Decisions". Journal of the Royal Statistical Society B, 14(1), 107-114.
Siddiqi, N. (2006). Credit Risk Scorecards. Wiley.
ob_numerical_mrblp for nearly identical algorithm with better pre-binning,
ob_numerical_mblp for correlation-based direction detection,
ob_numerical_mdlp for information-theoretic optimality.
# \donttest{
set.seed(123)
n <- 5000
feature <- c(
rnorm(2000, 600, 50),
rnorm(2000, 680, 40),
rnorm(1000, 740, 30)
)
target <- c(
rbinom(2000, 1, 0.25),
rbinom(2000, 1, 0.10),
rbinom(1000, 1, 0.03)
)
result <- ob_numerical_oslp(
feature = feature,
target = target,
min_bins = 3,
max_bins = 5
)
print(result$woe)
print(result$total_iv)
# Compare with MRBLP (should be nearly identical)
result_mrblp <- ob_numerical_mrblp(
feature = feature,
target = target,
min_bins = 3,
max_bins = 5
)
data.frame(
Method = c("OSLP", "MRBLP"),
Total_IV = c(result$total_iv, result_mrblp$total_iv),
N_Bins = c(length(result$woe), length(result_mrblp$woe))
)
# }
Run the code above in your browser using DataLab