autotune_VIM_regrImp: Perform imputation using VIM package and regressionImp function.

Description

Function use Regression models to impute missing data.

Usage

autotune_VIM_regrImp(
  df,
  col_type = NULL,
  percent_of_missing = NULL,
  col_0_1 = FALSE,
  robust = FALSE,
  mod_cat = FALSE,
  use_imputed = FALSE,
  out_file = NULL
)

Value

Return one data.frame with imputed values.

Arguments

df: data.frame. Df to impute with column names and without target column.
col_type: Character vector with types of columns.
percent_of_missing: numeric vector. Vector contatining percent of missing data in columns for example c(0,1,0,0,11.3,..)
col_0_1: Decaid if add bonus column informing where imputation been done. 0 - value was in dataset, 1 - value was imputed. Default False. (Works only for returning one dataset).
robust: TRUE/FALSE if robust regression should be used.
mod_cat: TRUE/FALSE if TRUE for categorical variables the level with the highest prediction probability is selected, otherwise it is sampled according to the probabilities.
use_imputed: TRUE/FALSE if TURE already imputed columns will be used to impute another.
out_file: Output log file location if file already exists log message will be added. If NULL no log will be produced.

Author

Alexander Kowarik, Matthias Templ (2016) tools:::Rd_expr_doi("10.18637/jss.v074.i07")

Details

Function impute one column per iteration to allow more control of imputation. All columns with missing values can be imputed with different formulas. For every new column to imputation one of four formula is used
1. col to impute ~ all columns without missing
2. col to impute ~ all numeric columns without missing
3. col to impute ~ first of columns without missing
4. col to impute ~ first of numeric columns without missing
For example, if formula 1 and 2 can't be used algorithm will try with formula 3. If all formula can't be used function will be stoped and error form tries with formula 4 or 3 presented. In some case, setting use_imputed on TRUE can solve this problem but in general its lower quality of imputation.

References

Alexander Kowarik, Matthias Templ (2016). Imputation with the R Package VIM. Journal of Statistical Software, 74(7), 1-16. doi:10.18637/jss.v074.i07

Examples

Run this code

{
  raw_data <- data.frame(
    a = as.factor(sample(c("red", "yellow", "blue", NA), 1000, replace = TRUE)),
    b = as.integer(1:1000),
    c = as.factor(sample(c("YES", "NO", NA), 1000, replace = TRUE)),
    d = runif(1000, 1, 10),
    e = as.factor(sample(c("YES", "NO"), 1000, replace = TRUE)),
    f = as.factor(sample(c("male", "female", "trans", "other", NA), 1000, replace = TRUE)))

  # Prepering col_type
  col_type <- c("factor", "integer", "factor", "numeric", "factor", "factor")

  percent_of_missing <- 1:6
  for (i in percent_of_missing) {
    percent_of_missing[i] <- 100 * (sum(is.na(raw_data[, i])) / nrow(raw_data))
  }


  imp_data <- autotune_VIM_regrImp(raw_data, col_type, percent_of_missing)

  # Check if all missing value was imputed
  sum(is.na(imp_data)) == 0
  # TRUE
}

Run the code above in your browser using DataLab