training_model: Training model

Description

training_model Model builder

Usage

training_model(model_name = "mymodel", dat, dat_test = NULL,
  target = NULL, occur_time = NULL, obs_id = NULL, x_list = NULL,
  ex_cols = NULL, pos_flag = NULL, prop = 0.7, preproc = TRUE,
  low_var = TRUE, merge_cat = TRUE, one_hot = FALSE,
  trans_log = FALSE, outlier_proc = TRUE, missing_proc = TRUE,
  miss_values = NULL, feature_filter = list(filter = c("IV", "PSI",
  "COR", "XGB"), iv_cp = 0.02, psi_cp = 0.1, xgb_cp = 0, cv_folds = 1,
  hopper = FALSE), algorithm = list("LR", "XGB", "GBM", "RF"),
  LR.params = lr_params(), XGB.params = xgb_params(),
  GBM.params = gbm_params(), RF.params = rf_params(),
  breaks_list = NULL, parallel = FALSE, cores_num = NULL,
  save_pmml = FALSE, plot_show = FALSE, vars_plot = TRUE,
  model_path = tempdir(), seed = 46, ...)

Arguments

model_name

A string, name of the project. Default is "mymodel"

dat

A data.frame with independent variables and target variable.

dat_test

A data.frame of test data. Default is NULL.

target

The name of target variable.

occur_time

The name of the variable that represents the time at which each observation takes place.Default is NULL.

obs_id

The name of ID of observations or key variable of data. Default is NULL.

x_list

Names of independent variables. Default is NULL.

ex_cols

Names of excluded variables. Regular expressions can also be used to match variable names. Default is NULL.

pos_flag

The value of positive class of target variable, default: "1".

prop

Percentage of train-data after the partition. Default: 0.7.

preproc

Logical. Preprocess data. Default is TRUE.

low_var

Logical, delete low variance variables or not. Default is TRUE.

merge_cat

merge categories of character variables that is more than m.

one_hot

Logical. If TRUE, one-hot_encoding of category variables. Default is FASLE.

trans_log

Logical, Logarithmic transformation. Default is FALSE.

outlier_proc

Logical. If TRUE, Outliers processing using Kmeans and Local Outlier Factor. Default is TRUE

missing_proc

Logical. If TRUE, missing value analysis and process missing value by knn imputation or central impulation or random imputation. Default is TRUE

miss_values

Other extreme value might be used to represent missing values, e.g: -9999, -9998. These miss_values will be encoded to -1 or "Missing".

feature_filter

Parameters for selecting important and stable features.See details at: feature_selector

algorithm

Algorithms for training a model. list("LR", "XGB", "GBDT", "RF") are available.

LR.params

Parameters of logistic regression & scorecard. See details at : lr_params.

XGB.params

Parameters of xgboost. See details at : xgb_params.

GBM.params

Parameters of GBM. See details at : gbm_params.

RF.params

Parameters of Random Forest. See details at : rf_params.

breaks_list

A table containing a list of splitting points for each independent variable. Default is NULL.

parallel

Default is FALSE.

cores_num

The number of CPU cores to use.

save_pmml

Logical, save model in PMML format. Default is TRUE.

plot_show

Logical, show model performance in current graphic device. Default is FALSE.

vars_plot

Logical, if TRUE, plot distribution ,correlation or partial dependence of model input variables . Default is TRUE.

model_path

The path for periodically saved data file. Default is tempdir().

seed

Random number seed. Default is 46.

...

Other parameters.

Value

A list containing Model Objects.

Examples

Run this code

# NOT RUN {
sub = cv_split(UCICreditCard, k = 30)[[1]]
dat = UCICreditCard[sub,]
x_list = c("LIMIT_BAL")
B_model = training_model(dat = dat,
                         model_name = "UCICreditCard",
                         target = "default.payment.next.month",
							x_list = x_list,
                         occur_time =NULL,
                         obs_id =NULL,
							dat_test = NULL,
                         preproc = FALSE,
                         outlier_proc = FALSE,
                         missing_proc = FALSE,
                         feature_filter = NULL,
                         algorithm = list("LR"),
                         LR.params = lr_params(lasso = FALSE,
                                               step_wise = FALSE,
                                                 score_card = FALSE),
                         breaks_list = NULL,
                         parallel = FALSE,
                         cores_num = NULL,
                         save_pmml = FALSE,
                         plot_show = FALSE,
                         vars_plot = FALSE,
                         model_path = tempdir(),
                         seed = 46)

# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples