powered by
This function filter variables base on the specified conditions, including minimum iv, maximum na percentage and maximum element percentage.
var_filter(dt, y, x = NA, iv_limit = 0.02, na_perc_limit = 0.95, ele_perc_limit = 0.95, var_rm = NA, var_kp = NA)
A data frame with both x (predictor/feature) and y (response/label) variables.
Name of y variable.
Name vector of x variables. Default NA. If x is NA, all variables exclude y will counted as x variables.
The minimum IV of each kept variable, default 0.02.
The maximum NA percent of each kept variable, default 0.95.
The maximum element (excluding NAs) percentage in each kept variable, default 0.95.
Name vector of force removed variables, default NA.
Name vector of force kept variables, default NA.
A dataframe with y and selected x variables
# NOT RUN { # Load German credit data data(germancredit) # variable filter dt_selected <- var_filter(germancredit, y = "creditability") # }
Run the code above in your browser using DataLab