perf_psi: PSI

Description

perf_psi calculates population stability index (PSI) based on provided credit score and provides plot of credit score distribution.

Usage

perf_psi(score, label = NULL, title = "", x_limits = c(100, 800),
  x_tick_break = 50, show_plot = TRUE, seed = 186,
  return_distr_dat = FALSE)

Arguments

score

A list of credit score for actual and expected data samples. For example, score = list(train = score_A, test = score_E), both score_A and score_E are dataframes with the same column names.

label

A list of label values for actual and expected data samples. For example, label = list(train = label_A, test = label_E), both label_A and label_E are vectors or dataframes. The label values should be 0s and 1s, 0 represent for good and 1 for bad.

title

Title of plot, default "".

x_limits

x-axis limits, default c(0, 800).

x_tick_break

x-axis ticker break, default 100.

show_plot

Logical value, default TRUE. It means whether to show plot.

seed

An integer. The specify seed is used for random sorting data, default 186.

return_distr_dat

Logical, default FALSE.

Value

a dataframe of psi & plots of credit score distribution

Details

The population stability index (PSI) formula is displayed below: $P S I = \sum ((A c t u a l % - E x p e c t e d %) * (\ln (\frac{A c t u a l %}{E x p e c t e d %}))) .$ The rule of thumb for the PSI is as follows: Less than 0.1 inference insignificant change, no action required; 0.1 - 0.25 inference some minor change, check other scorecard monitoring metrics; Greater than 0.25 inference major shift in population, need to delve deeper.

Examples

Run this code

# NOT RUN {
library(data.table)
library(scorecard)

# load germancredit data
data("germancredit")

# rename creditability as y
dt = data.table(germancredit)[, `:=`(
  y = ifelse(creditability == "bad", 1, 0),
  creditability = NULL
)]

# breaking dt into train and test ------
dt_list = split_df(dt, "y", ratio = 0.6, seed=21)
dt_train = dt_list$train; dt_test = dt_list$test

# woe binning ------
bins = woebin(dt_train, "y")

# converting train and test into woe values
train = woebin_ply(dt_train, bins)
test = woebin_ply(dt_test, bins)

# glm ------
m1 = glm( y ~ ., family = "binomial", data = train)
# summary(m1)

# Select a formula-based model by AIC
m_step = step(m1, direction="both", trace=FALSE)
m2 = eval(m_step$call)
# summary(m2)

# predicted proability
train_pred = predict(m2, type='response', train)
test_pred = predict(m2, type='response', test)

# # ks & roc plot
# perf_eva(train$y, train_pred, title = "train")
# perf_eva(train$y, train_pred, title = "test")

#' # scorecard
card = scorecard(bins, m2)

# credit score, only_total_score = TRUE
train_score = scorecard_ply(dt_train, card)
test_score = scorecard_ply(dt_test, card)

# Example I # psi
psi = perf_psi(
  score = list(train = train_score, test = test_score),
  label = list(train = train$y, test = test$y)
)
# psi$psi  # psi dataframe
# psi$pic  # pic of score distribution

# Example II # specifying score range
psi_s = perf_psi(
  score = list(train = train_score, test = test_score),
  label = list(train = train$y, test = test$y),
  x_limits = c(200, 750),
  x_tick_break = 50
  )

# Example III # credit score, only_total_score = FALSE
train_score2 = scorecard_ply(dt_train, card, only_total_score=FALSE)
test_score2 = scorecard_ply(dt_test, card, only_total_score=FALSE)

# psi
psi2 = perf_psi(
  score = list(train = train_score2, test = test_score2),
  label = list(train = train$y, test = test$y)
)
# psi2$psi  # psi dataframe
# psi2$pic  # pic of score distribution
# }

Run the code above in your browser using DataLab

State of Data and AI Literacy Report 2025