obwoe_apply: Apply Weight of Evidence Transformations to New Data

Description

Applies the binning and Weight of Evidence (WoE) transformations learned by obwoe to new data. This is the scoring function for deploying WoE-based models in production. For each feature, the function assigns observations to bins and maps them to their corresponding WoE values.

Usage

obwoe_apply(
  data,
  obj,
  suffix_bin = "_bin",
  suffix_woe = "_woe",
  keep_original = TRUE,
  na_woe = 0
)

Value

A data.frame containing:

target: The target column (if present in data)
<feature>: Original feature values (if keep_original = TRUE)
<feature>_bin: Assigned bin label for each observation
<feature>_woe: Weight of Evidence value for the assigned bin

Arguments

data: A data.frame containing the features to transform. Must include all features present in the obj results. The target column is optional; if present, it will be included in the output.
obj: An object of class "obwoe" returned by obwoe.
suffix_bin: Character string suffix for bin columns. Default is "_bin".
suffix_woe: Character string suffix for WoE columns. Default is "_woe".
keep_original: Logical. If TRUE (default), include the original feature columns in the output. If FALSE, only bin and WoE columns are returned.
na_woe: Numeric value to assign when an observation cannot be mapped to a bin (e.g., new categories not seen during training). Default is 0.

Details

Bin Assignment Logic

Numerical Features: Observations are assigned to bins based on cutpoints stored in the obwoe object. The cut() function is used with intervals \((a_i, a_{i+1}]\) where \(a_0 = -\infty\) and \(a_k = +\infty\).

Categorical Features: Categories are matched directly to bin labels. Categories not seen during training are assigned NA for bin and na_woe for WoE.

Production Deployment

For production scoring, it is recommended to:

Train the binning model using obwoe() on the training set
Save the fitted object with saveRDS()
Load and apply using obwoe_apply() on new data

The WoE-transformed features can be used directly as inputs to logistic regression or other linear models, enabling interpretable credit scorecards.

References

Siddiqi, N. (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. John Wiley & Sons. tools:::Rd_expr_doi("10.1002/9781119201731")

Examples

Run this code

# \donttest{
# =============================================================================
# Example 1: Basic Usage - Train and Apply
# =============================================================================
set.seed(42)
n <- 1000

# Training data
train_df <- data.frame(
  age = rnorm(n, 40, 15),
  income = exp(rnorm(n, 10, 0.8)),
  education = sample(c("HS", "BA", "MA", "PhD"), n, replace = TRUE),
  target = rbinom(n, 1, 0.15)
)

# Fit binning model
model <- obwoe(train_df, target = "target")

# New data for scoring (could be validation/test set)
new_df <- data.frame(
  age = c(25, 45, 65),
  income = c(20000, 50000, 80000),
  education = c("HS", "MA", "PhD")
)

# Apply transformations
scored <- obwoe_apply(new_df, model)
print(scored)

# Use WoE features for downstream modeling
woe_cols <- grep("_woe$", names(scored), value = TRUE)
print(woe_cols)

# =============================================================================
# Example 2: Without Original Features
# =============================================================================

scored_compact <- obwoe_apply(new_df, model, keep_original = FALSE)
print(scored_compact)
# }

Run the code above in your browser using DataLab