Removes variable names from a list of variables that contain only, or a large portion of, NA values or have zero bandwidth (if they are numeric) and returns the variable names.
remove_empty_features(
all.features,
dataset,
percentage_NA_allowed = NA,
bandwidth = (.Machine$double.eps^0.5),
verbose = FALSE
)
a vector of variable names that are not considered as empty
a character vector with all column names of dataset
that should be
considered by the function
the dataset as a data.frame
the percentage of missing values per vector that should be allowed without removing the feature. All features with NA values that are higher than this level are excluded.
The length of the interval that values of variable must exceed to be not
removed. By default, half of .Machine$double.eps
is used.
boolean if debug messages should be printed when a variable is removed from the list (uses flog.debug)
Konstantin Hopf konstantin.hopf@uni-bamberg.de
The function checks all given column names for the portion of NA values.
If the number of NA of Inf exceeds percentage_NA_allowed
,
the column name is removed from the variable set. Besides, all numeric
variables are checked if they have almost zero bandwidth
, are removed.
naInf_omit, replaceNAsFeatures