process_nas_var
is for missing value analysis and treatment using knn imputation, central impulation and random imputation.
process_nas
is a simpler wrapper for process_nas_var
.
process_nas(dat, x_list = NULL, default_miss = TRUE,
class_var = FALSE, parallel = FALSE, ex_cols = NULL,
method = "median", note = FALSE, save_data = FALSE,
file_name = NULL, dir_path = tempdir(), ...)process_nas_var(dat = dat, x, default_miss = TRUE, nas_rate = NULL,
mat_nas_shadow = NULL, dt_nas_random = NULL, missing_type = NULL,
method = "median", note = FALSE, save_data = FALSE,
file_name = NULL, dir_path = tempdir(), ...)
A data.frame with independent variables.
Names of independent variables.
Logical. If TRUE, assigning the missing values to -1 or "Missing", otherwise ,processing the missing values according to the results of missing analysis.
Logical, nas analysis of the nominal variables. Default is TRUE.
Logical, parallel computing. Default is FALSE.
A list of excluded variables. Regular expressions can also be used to match variable names. Default is NULL.
The methods of imputation by knn."median" is knn imputation by k neighbors median.
Logical, outputs info. Default is TRUE.
Logical. If TRUE, save missing analysis to dir_path
The file name for periodically saved missing analysis file. Default is NULL.
The path for periodically saved missing analysis file. Default is "./variable".
Other parameters.
The name of variable to process.
A list contains nas rate of each variable.
A shadow matrix of variables which contain nas.
A data.frame with random nas imputation.
Type of missing, genereted by codeanalysis_nas
A dat frame with no NAs.
# NOT RUN {
dat_na = process_nas(dat = UCICreditCard[1:1000,], default_miss = FALSE,
target = "default.payment.next.month",
parallel = FALSE,ex_cols = "ID$" ,method = "median")
# }
Run the code above in your browser using DataLab