powered by
This function filter variables base on specified conditions, such as information value, missing rate, identical value rate.
var_filter(dt, y, x = NULL, iv_limit = 0.02, missing_limit = 0.95, identical_limit = 0.95, var_rm = NULL, var_kp = NULL, return_rm_reason = FALSE, positive = "bad|1")
A data frame with both x (predictor/feature) and y (response/label) variables.
Name of y variable.
Name of x variables. Default is NULL. If x is NULL, then all variables except y are counted as x variables.
The information value of kept variables should >= iv_limit. The default is 0.02.
The missing rate of kept variables should <= missing_limit. The default is 0.95.
The identical value rate (excluding NAs) of kept variables should <= identical_limit. The default is 0.95.
Name of force removed variables, default is NULL.
Name of force kept variables, default is NULL.
Logical, default is FALSE.
Value of positive class, default is "bad|1".
A data.table with y and selected x variables and a data.table with the reason of removed x variable if return_rm_reason == TRUE.
# NOT RUN { # Load German credit data data(germancredit) # variable filter dt_sel = var_filter(germancredit, y = "creditability") # }
Run the code above in your browser using DataLab