process_nas_var
is for missing value analysis and treatment using knn imputation, central impulation and random imputation.
process_nas
is a simpler wrapper for process_nas_var
.
process_nas(dat, x_list = NULL, default_miss = TRUE,
class_var = FALSE, miss_values = NULL, parallel = FALSE,
ex_cols = NULL, method = "median", note = FALSE,
save_data = FALSE, file_name = NULL, dir_path = tempdir(), ...)process_nas_var(dat = dat, x, default_miss = TRUE, nas_rate = NULL,
mat_nas_shadow = NULL, dt_nas_random = NULL, missing_type = NULL,
method = "median", note = FALSE, save_data = FALSE,
file_name = NULL, dir_path = tempdir(), ...)
A data.frame with independent variables.
Names of independent variables.
Logical. If TRUE, assigning the missing values to -1 or "Missing", otherwise ,processing the missing values according to the results of missing analysis.
Logical, nas analysis of the nominal variables. Default is TRUE.
Other extreme value might be used to represent missing values, e.g: -9999, -9998. These miss_values will be encoded to -1 or "Missing".
Logical, parallel computing. Default is FALSE.
A list of excluded variables. Regular expressions can also be used to match variable names. Default is NULL.
The methods of imputation by knn."median" is knn imputation by k neighbors median.
Logical, outputs info. Default is TRUE.
Logical. If TRUE, save missing analysis to dir_path
The file name for periodically saved missing analysis file. Default is NULL.
The path for periodically saved missing analysis file. Default is "./variable".
Other parameters.
The name of variable to process.
A list contains nas rate of each variable.
A shadow matrix of variables which contain nas.
A data.frame with random nas imputation.
Type of missing, genereted by codeanalysis_nas
A dat frame with no NAs.
# NOT RUN {
dat_na = process_nas(dat = UCICreditCard[1:1000,], default_miss = FALSE,
target = "default.payment.next.month",
parallel = FALSE,ex_cols = "ID$" ,method = "median")
# }
Run the code above in your browser using DataLab