Learn R Programming

NADIA (version 0.4.2)

autotune_VIM_hotdeck: Hot-Deck imputation using VIM package.

Description

Function perform hotdeck function from VIM package. Any tunable parameters aren't available in this algorithm.

Usage

autotune_VIM_hotdeck(
  df,
  percent_of_missing = NULL,
  col_0_1 = FALSE,
  out_file = NULL
)

Value

Return data.frame with imputed values.

Arguments

df

data.frame. Df to impute with column names and without target column.

percent_of_missing

numeric vector. Vector contatining percent of missing data in columns for example c(0,1,0,0,11.3,..)

col_0_1

decide if add bonus column informing where imputation been done. 0 - value was in dataset, 1 - value was imputed. Default False.

out_file

Output log file location if file already exists log message will be added. If NULL no log will be produced.

Author

Alexander Kowarik, Matthias Templ (2016) tools:::Rd_expr_doi("10.18637/jss.v074.i07")

References

Alexander Kowarik, Matthias Templ (2016). Imputation with the R Package VIM. Journal of Statistical Software, 74(7), 1-16. doi:10.18637/jss.v074.i07

Examples

Run this code
{
  raw_data <- data.frame(
    a = as.factor(sample(c("red", "yellow", "blue", NA), 1000, replace = TRUE)),
    b = as.integer(1:1000),
    c = as.factor(sample(c("YES", "NO", NA), 1000, replace = TRUE)),
    d = runif(1000, 1, 10),
    e = as.factor(sample(c("YES", "NO"), 1000, replace = TRUE)),
    f = as.factor(sample(c("male", "female", "trans", "other", NA), 1000, replace = TRUE)))

  # Prepering col_type
  col_type <- c("factor", "integer", "factor", "numeric", "factor", "factor")

  percent_of_missing <- 1:6
  for (i in percent_of_missing) {
    percent_of_missing[i] <- 100 * (sum(is.na(raw_data[, i])) / nrow(raw_data))
  }


  imp_data <- autotune_VIM_hotdeck(raw_data, percent_of_missing)

  # Check if all missing value was imputed
  sum(is.na(imp_data)) == 0
  # TRUE
}

Run the code above in your browser using DataLab