Learn R Programming

NADIA (version 0.4.2)

autotune_Amelia: Perform imputation using Amelia package and EMB algorithm.

Description

Function use EMB (Expectation-Maximization with Bootstrapping ) to impute missing data. Function performance is highly depend from data structure and chosen parameters.

Usage

autotune_Amelia(
  df,
  col_type = NULL,
  percent_of_missing = NULL,
  col_0_1 = FALSE,
  parallel = TRUE,
  polytime = NULL,
  splinetime = NULL,
  intercs = FALSE,
  empir = NULL,
  verbose = FALSE,
  return_one = TRUE,
  m = 3,
  out_file = NULL
)

Value

Return one data.frame with imputed values or amelia object.

Arguments

df

data.frame. Df to impute with column names and without target column.

col_type

character vector. Vector containing column type names.

percent_of_missing

numeric vector. Vector contatining percent of missing data in columns for example c(0,1,0,0,11.3,..)

col_0_1

Decaid if add bonus column informing where imputation been done. 0 - value was in dataset, 1 - value was imputed. Default False. (Works only for returning one dataset).

parallel

If true parallel calculation is used.

polytime

parameter pass to amelia function

splinetime

parameter pass to amelia finction

intercs

parameter pass to amleia function

empir

parameter pass to amelia function as empir in Amelia == empir*nrow(df). If empir dont set empir=nrow(df)*0.015.

verbose

If true function will print on console.

return_one

Decide if one dataset or amelia object will be returned.

m

Number of datasets generated by amelia. If retrun_one=TRUE first dataset will be given.

out_file

Output log file location if file already exists log message will be added. If NULL no log will be produced.

Author

James Honaker, Gary King, Matthew Blackwell (2011).

References

James Honaker, Gary King, Matthew Blackwell (2011). Amelia II: A Program for Missing Data. Journal of Statistical Software, 45(7), 1-47. URL https://www.jstatsoft.org/v45/i07/.

Examples

Run this code
{
  raw_data <- data.frame(
    a = as.factor(sample(c("red", "yellow", "blue", NA), 1000, replace = TRUE)),
    b = as.integer(1:1000),
    c = as.factor(sample(c("YES", "NO", NA), 1000, replace = TRUE)),
    d = runif(1000, 1, 10),
    e = as.factor(sample(c("YES", "NO"), 1000, replace = TRUE)),
    f = as.factor(sample(c("male", "female", "trans", "other", NA), 1000, replace = TRUE)))

  # Prepering col_type
  col_type <- c("factor", "integer", "factor", "numeric", "factor", "factor")

  percent_of_missing <- 1:6
  for (i in percent_of_missing) {
    percent_of_missing[i] <- 100 * (sum(is.na(raw_data[, i])) / nrow(raw_data))
  }


  imp_data <- autotune_Amelia(raw_data, col_type, percent_of_missing,parallel = FALSE)

  # Check if all missing value was imputed
  sum(is.na(imp_data)) == 0
  # TRUE
}

Run the code above in your browser using DataLab