
perf_psi
calculates population stability index (PSI) based on provided credit score and provides plot of credit score distribution.
perf_psi(score, label = NULL, title = "", x_limits = c(100, 800),
x_tick_break = 50, show_plot = TRUE, seed = 186,
return_distr_dat = FALSE)
A list of credit score for actual and expected data samples. For example, score = list(train = score_A, test = score_E), both score_A and score_E are dataframes with the same column names.
A list of label values for actual and expected data samples. For example, label = list(train = label_A, test = label_E), both label_A and label_E are vectors or dataframes. The label values should be 0s and 1s, 0 represent for good and 1 for bad.
Title of plot, default "".
x-axis limits, default c(0, 800).
x-axis ticker break, default 100.
Logical value, default TRUE. It means whether to show plot.
An integer. The specify seed is used for random sorting data, default 186.
Logical, default FALSE.
a dataframe of psi & plots of credit score distribution
The population stability index (PSI) formula is displayed below:
# NOT RUN {
library(data.table)
library(scorecard)
# load germancredit data
data("germancredit")
# rename creditability as y
dt = data.table(germancredit)[, `:=`(
y = ifelse(creditability == "bad", 1, 0),
creditability = NULL
)]
# breaking dt into train and test ------
dt_list = split_df(dt, "y", ratio = 0.6, seed=21)
dt_train = dt_list$train; dt_test = dt_list$test
# woe binning ------
bins = woebin(dt_train, "y")
# converting train and test into woe values
train = woebin_ply(dt_train, bins)
test = woebin_ply(dt_test, bins)
# glm ------
m1 = glm( y ~ ., family = "binomial", data = train)
# summary(m1)
# Select a formula-based model by AIC
m_step = step(m1, direction="both", trace=FALSE)
m2 = eval(m_step$call)
# summary(m2)
# predicted proability
train_pred = predict(m2, type='response', train)
test_pred = predict(m2, type='response', test)
# # ks & roc plot
# perf_eva(train$y, train_pred, title = "train")
# perf_eva(train$y, train_pred, title = "test")
#' # scorecard
card = scorecard(bins, m2)
# credit score, only_total_score = TRUE
train_score = scorecard_ply(dt_train, card)
test_score = scorecard_ply(dt_test, card)
# Example I # psi
psi = perf_psi(
score = list(train = train_score, test = test_score),
label = list(train = train$y, test = test$y)
)
# psi$psi # psi dataframe
# psi$pic # pic of score distribution
# Example II # specifying score range
psi_s = perf_psi(
score = list(train = train_score, test = test_score),
label = list(train = train$y, test = test$y),
x_limits = c(200, 750),
x_tick_break = 50
)
# Example III # credit score, only_total_score = FALSE
train_score2 = scorecard_ply(dt_train, card, only_total_score=FALSE)
test_score2 = scorecard_ply(dt_test, card, only_total_score=FALSE)
# psi
psi2 = perf_psi(
score = list(train = train_score2, test = test_score2),
label = list(train = train$y, test = test$y)
)
# psi2$psi # psi dataframe
# psi2$pic # pic of score distribution
# }
Run the code above in your browser using DataLab