Learn R Programming

⚠️There's a newer version (1.3.1) of this package.Take me there.

creditmodel

creditmodel is a free and open source automated modeling R package designed to help model developers improve model development efficiency and enable many people with no background in data science to complete the modeling work in a short time.Let them focus more on the problem itself and allocate more time to decision-making.

creditmodel covers various tools such as data preprocessing, variable processing/derivation, variable screening/dimensionality reduction, modeling, data analysis, data visualization, model evaluation, strategy analysis, etc. It is a set of customized "core" tool kit for model developers.

creditmodel is suitable for machine learning automated modeling of classification targets, and is more suitable for the risk and marketing data of financial credit, e-commerce, and insurance with relatively high noise and low information content.

Installation

# install.packages("creditmodel")

Example

 # Automated Model Development Process


 if (!dir.exists("c:/test_model")) dir.create("c:/test_model")
 setwd("c:/test_model")
 library(creditmodel)
 sub = cv_split(UCICreditCard, k = 3)[[1]]
 dat = UCICreditCard[sub,]
 dat = re_name(dat, "default.payment.next.month", "target")
 dat = data_cleansing(dat, target = "target", obs_id = "ID", occur_time = "apply_date", miss_values = list("", -1, -2))
 train_test =train_test_split(dat, split_type = "OOT", prop = 0.7, occur_time = "apply_date")
 dat_train = train_test$train
 dat_test = train_test$test
 
 B_model = training_model(dat = dat_train,
                         model_name = "UCICreditCard", target = "target", x_list = NULL,
                         occur_time = "apply_date", obs_id = "ID", dat_test = dat_test,
                         preproc = FALSE,
                         feature_filter = NULL,
                         algorithm = list("RF","LR","XGB","GBM"),
                         LR.params = lr_params(lasso = TRUE,
                                               step_wise = FALSE, vars_plot = FALSE),
                         XGB.params = xgb_params(),
                         breaks_list = NULL,
                         parallel = FALSE, cores_num = NULL,
                         save_pmml = FALSE, plot_show = FALSE,
                         model_path = getwd(),
                         seed = 46)

Copy Link

Version

Install

install.packages('creditmodel')

Monthly Downloads

543

Version

1.3.0

License

AGPL-3

Maintainer

Dongping Fan

Last Published

January 25th, 2021

Functions in creditmodel (1.3.0)

add_variable_process

add_variable_process
PCA_reduce

PCA Dimension Reduction
char_to_num

character to number
auc_value

auc_value auc_value is for get best lambda required in lasso_filter. This function required in lasso_filter
analysis_nas

missing Analysis
char_cor_vars

Cramer's V matrix between categorical variables.
address_varieble

address_varieble
as_percent

Percent Format
UCICreditCard

UCI Credit Card data
cos_sim

cos_sim
analysis_outliers

Outliers Analysis
de_one_hot_encoding

Recovery One-Hot Encoding
creditmodel-package

creditmodel: toolkit for credit modeling and data analysis
date_cut

Date Time Cut Point
cross_table

cross_table
cor_plot

Correlation Plot
digits_num

Number of digits
customer_segmentation

Customer Segmentation
cor_heat_plot

Correlation Heat Plot
derived_ts_vars

Derivation of Behavioral Variables
get_correlation_group

get_correlation_group
cohort_analysis

cohort_analysis cohort_analysis is for cohort(vintage) analysis.
cohort_table_plot

cohort_table_plot cohort_table_plot is for ploting cohort(vintage) analysis table.
get_ctree_rules

Parse desision tree rules
check_rules

check rules
city_varieble

city_varieble
city_varieble_process

Processing of Address Variables
%alike%

Fuzzy String matching
derived_pct

derived_pct
derived_partial_acf

derived_partial_acf
get_tree_breaks

Getting the breaks for terminal nodes from decision tree
get_names

Get Variable Names
get_median

get central value.
checking_data

Checking Data
get_iv_all

Calculate Information Value (IV) get_iv is used to calculate Information Value (IV) of an independent variable. get_iv_all can loop through IV for all specified independent variables.
gather_data

gather or aggregate data
cut_equal

Generating Initial Equal Size Sample Bins
get_nas_random

get_nas_random
gbm_filter

Select Features using GBM
get_x_list

Get X List.
low_variance_filter

Filtering Low Variance Variables
lr_params

Logistic Regression & Scorecard Parameters
get_plots

Plot Independent Variables Distribution
p_to_score

prob to socre
%islike%

Fuzzy String matching
null_blank_na

Encode NAs
get_logistic_coef

get logistic coef
one_hot_encoding

One-Hot Encoding
log_trans

Logarithmic transformation
ks_table

ks_table & plot
min_max_norm

Min Max Normalization
local_outlier_factor

local_outlier_factor local_outlier_factor is function for calculating the lof factor for a data set using knn This function is not intended to be used by end user.
merge_category

Merge Category
entropy_weight

Entropy Weight Method
feature_selector

Feature Selection Wrapper
plot_relative_freq_histogram

Plot Relative Frequency Histogram
cv_split

Stratified Folds
entry_rate_na

Max Percent of missing Value
pred_score

pred_score
partial_dependence_plot

partial_dependence_plot
data_cleansing

Data Cleaning
plot_bar

Plot Bar
plot_oot_perf

plot_oot_perf plot_oot_perf is for ploting performance of cross time samples in the future
data_exploration

Data Exploration
derived_interval

derived_interval
euclid_dist

euclid_dist
de_percent

Recovery Percent Format
get_shadow_nas

get_shadow_nas
fuzzy_cluster_means

Fuzzy Cluster means.
rowAny

Functions for vector operation.
knn_nas_imp

Imputate nas using KNN
str_match

string match #' str_match search for matches to argument pattern within each element of a character vector:
pred_xgb

pred_xgb
get_psi_all

Calculate Population Stability Index (PSI) get_psi is used to calculate Population Stability Index (PSI) of an independent variable. get_psi_all can loop through PSI for all specified independent variables.
get_bins_table_all

Table of Binning
rf_params

Random Forest Parameters
tnr_value

tnr_value
split_bins_all

Split bins all
sql_hive_text_parse

Automatic production of hive SQL
re_name

Rename
eval_auc

Functions of xgboost feval
lasso_filter

Variable selection by LASSO
ks_value

ks_value
psi_iv_filter

Variable reduction based on Information Value & Population Stability Index filter
sum_table

Summary table
get_psi_iv_all

Calculate IV & PSI
plot_box

Plot Box
select_best_class

Generates Best Binning Breaks
multi_left_join

multi_left_join
n_char

The length of a string.
rule_value_replace

rule_value_replace
fast_high_cor_filter

high_cor_filter
ewm_data

Entropy Weight Method Data
require_packages

Packages required and intallment
replace_value

Replace Value
process_nas

missing Treatment
get_breaks_all

Generates Best Breaks for Binning
xgb_data

XGboost data
score_transfer

Score Transformation
xgb_params

XGboost Parameters
train_lr

Trainig LR model
get_sim_sign_lambda

get_sim_sign_lambda get_sim_sign_lambda is for get Best lambda required in lasso_filter. This function required in lasso_filter
lift_value

lift_value
lendingclub

Lending Club data
p_ij

Entropy
outliers_detection

Outliers Detection outliers_detection is for outliers detecting using Kmeans and Local Outlier Factor (lof)
loop_function

Loop Function. #' loop_function is an iterator to loop through
lr_vif

Variance-Inflation Factors
gbm_params

GBM Parameters
start_parallel_computing

Parallel computing and export variables to global Env.
read_data

Read data
stop_parallel_computing

Stop parallel computing
xgb_filter

Select Features using XGB
high_cor_selector

Compare the two highly correlated variables
model_result_plot

model result plots model_result_plot is a wrapper of following: perf_table is for generating a model performance table. ks_plot is for K-S. roc_plot is for ROC. lift_plot is for Lift Chart. score_distribution_plot is for ploting the score distribution.
get_psi_plots

Plot PSI(Population Stability Index)
get_auc_ks_lambda

get_auc_ks_lambda get_auc_ks_lambda is for get best lambda required in lasso_filter. This function required in lasso_filter
get_score_card

Score Card
is_date

is_date
max_min_norm

Max Min Normalization
swap_analysis

Swap Out/Swap In Analysis
multi_grid

Arrange list of plots into a grid
process_outliers

Outliers Treatment
sim_str

sim_str
plot_density

Plot Density
plot_colors

Plot Colors
quick_as_df

List as data.frame quickly
variable_process

variable_process
rules_filter

rules_filter
woe_trans_all

WOE Transformation
plot_distribution

Plot Distribution
love_color

love_color
plot_line

Plot Line
reduce_high_cor_filter

Filtering highly correlated variables with reduce method
split_bins

split_bins
term_tfidf

TF-IDF
remove_duplicated

Remove Duplicated Observations
time_series_proc

Process time series data
train_test_split

Train-Test-Split
time_transfer

Time Format Transfering
plot_table

plot_table
plot_theme

plot_theme
ranking_percent_proc

Ranking Percent Process
re_code

re_code re_code search for matches to argument pattern within each element of a character vector:
save_data

Save data
rules_result

rules_result
train_xgb

Training XGboost
time_variable

time_variable
training_model

Training model
time_vars_process

Processing of Time or Date Variables
var_group_proc

Process group numeric variables