feature_select_wrapper
This function uses four different methods (IV, PSI, correlation, xgboost) in order to select important features.The correlation algorithm must be used with IV.
feature_select_wrapper(dat_train, dat_test = NULL, x_list = NULL,
target = NULL, pos_flag = NULL, occur_time = NULL,
ex_cols = NULL, filter = c("IV", "PSI", "XGB", "COR"),
cv_folds = 1, iv_cp = 0.01, psi_cp = 0.1, xgb_cp = 0,
cor_cp = 0.98, breaks_list = NULL, hopper = FALSE,
vars_name = TRUE, parallel = FALSE, note = FALSE, seed = 46,
save_data = FALSE, file_name = NULL, dir_path = tempdir(), ...)
A data.frame with independent variables and target variable.
A data.frame of test data. Default is NULL.
Names of independent variables.
The name of target variable.
The value of positive class of target variable, default: "1".
The name of the variable that represents the time at which each observation takes place.
A list of excluded variables. Regular expressions can also be used to match variable names. Default is NULL.
The methods for selecting important and stable variables.
Number of cross-validations. Default: 5.
The minimum threshold of IV. 0 < iv_i ; 0.01 to 0.1 usually work. Default: 0.02
The maximum threshold of PSI. 0 <= psi_i <=1; 0.05 to 0.2 usually work. Default: 0.1
Threshold of XGB feature's Gain. 0 <= xgb_cp <=1. Default is 1/number of independent variables.
Threshold of correlation between features. 0 <= cor_cp <=1; 0.7 to 0.98 usually work. Default is 0.98.
A table containing a list of splitting points for each independent variable. Default is NULL.
Logical.Filtering screening. Default is FALSE.
Logical, output a list of filtered variables or table with detailed IV and PSI value of each variable. Default is FALSE.
Logical, parallel computing. Default is FALSE.
Logical.Outputs info.Default is TRUE.
Random number seed. Default is 46.
Logical, save results in locally specified folder. Default is TRUE.
The name for periodically saved results files. Default is "select_vars".
The path for periodically saved results files. Default is "./variable"
Other parameters.
A list of selected features
# NOT RUN {
feature_select_wrapper(dat_train = UCICreditCard[1:1000,c(8:12,26)],
dat_test = NULL, target = "default.payment.next.month",
occur_time = "apply_date", filter = c("IV", "PSI"),
cv_folds = 1, iv_cp = 0.01, psi_cp = 0.1, xgb_cp = 0, cor_cp = 0.98,
vars_name = FALSE,note = FALSE)
# }
Run the code above in your browser using DataLab