Learn R Programming

scorecard (version 0.1.8)

var_filter: Variable Filter

Description

This function filter variables base on specified conditions, such as information value, missing rate, identical value rate.

Usage

var_filter(dt, y, x = NULL, iv_limit = 0.02, missing_limit = 0.95,
  identical_limit = 0.95, var_rm = NULL, var_kp = NULL,
  return_rm_reason = FALSE, positive = "bad|1")

Arguments

dt

A data frame with both x (predictor/feature) and y (response/label) variables.

y

Name of y variable.

x

Name of x variables. Default is NULL. If x is NULL, then all variables except y are counted as x variables.

iv_limit

The information value of kept variables should >= iv_limit. The default is 0.02.

missing_limit

The missing rate of kept variables should <= missing_limit. The default is 0.95.

identical_limit

The identical value rate (excluding NAs) of kept variables should <= identical_limit. The default is 0.95.

var_rm

Name of force removed variables, default is NULL.

var_kp

Name of force kept variables, default is NULL.

return_rm_reason

Logical, default is FALSE.

positive

Value of positive class, default is "bad|1".

Value

A data.table with y and selected x variables and a data.table with the reason of removed x variable if return_rm_reason == TRUE.

Examples

Run this code
# NOT RUN {
# Load German credit data
data(germancredit)

# variable filter
dt_sel = var_filter(germancredit, y = "creditability")


# }

Run the code above in your browser using DataLab