Learn R Programming

NADIA (version 0.4.2)

autotune_mice: Automatical tuning of parameters and imputation using mice package.

Description

Function impute missing data using mice functions. First perform random search using linear models (generalized linear models if only categorical values are available). Using glm its problematic. Function allows users to skip optimization in that case but it can lead to errors. Function optimize prediction matrix and method. Other mice parameters like number of sets(m) or max number of iterations(maxit) should be set as hight as possible for best results(higher values are required more time to perform imputation). If u chose to use one inputted dataset m is not important. More information can be found in random_param_mice_search and formula_creating and mice.

Usage

autotune_mice(
  df,
  m = 5,
  maxit = 5,
  col_miss = NULL,
  col_no_miss = NULL,
  col_type = NULL,
  set_cor = 0.5,
  set_method = "pmm",
  percent_of_missing = NULL,
  low_corr = 0,
  up_corr = 1,
  methods_random = c("pmm"),
  iter = 5,
  random.seed = 123,
  optimize = TRUE,
  correlation = TRUE,
  return_one = TRUE,
  col_0_1 = FALSE,
  verbose = FALSE,
  out_file = NULL
)

Value

Return imputed datasets or mids object containing multi imputation datasets.

Arguments

df

data frame for imputation.

m

number of sets produced by mice.

maxit

maximum number of iteration for mice.

col_miss

name of columns with missing values.

col_no_miss

character vector. Names of columns without NA.

col_type

character vector. Vector containing column type names.

set_cor

Correlation or fraction of featurs using if optimize= False

set_method

Method used if optimize=False. If NULL default method is used (more in methods_random section ).

percent_of_missing

numeric vector. Vector contatining percent of missing data in columns for example c(0,1,0,0,11.3,..)

low_corr

double betwen 0,1 default 0 lower boundry of correlation set.

up_corr

double between 0,1 default 1 upper boundary of correlation set. Both of these parameters work the same for a fraction of features.

methods_random

set of methods to chose. Default 'pmm'. If seted on NULL this methods are used predictive mean matching (numeric data) logreg, logistic regression imputation (binary data, factor with 2 levels) polyreg, polytomous regression imputation for unordered categorical data (factor > 2 levels) polr, proportional odds model for (ordered, > 2 levels).

iter

number of iteration for randomSearch.

random.seed

random seed.

optimize

if user wont to optimize.

correlation

If True correlation is using if Fales fraction of features. Default True.

return_one

One or many imputed sets will be returned. Default True.

col_0_1

Decaid if add bonus column informing where imputation been done. 0 - value was in dataset, 1 - value was imputed. Default False. (Works only for returning one dataset).

verbose

If FALSE function didn't print on console.

out_file

Output log file location if file already exists log message will be added. If NULL no log will be produced.

Author

Stef van Buuren, Karin Groothuis-Oudshoorn (2011).

Examples

Run this code
{
  raw_data <- mice::nhanes2

  col_type <- 1:ncol(raw_data)
  for (i in col_type) {
    col_type[i] <- class(raw_data[, i])
  }

  percent_of_missing <- 1:ncol(raw_data)
  for (i in percent_of_missing) {
    percent_of_missing[i] <- 100 * (sum(is.na(raw_data[, i])) / nrow(raw_data))
  }
  col_no_miss <- colnames(raw_data)[percent_of_missing == 0]
  col_miss <- colnames(raw_data)[percent_of_missing > 0]
  imp_data <- autotune_mice(raw_data, optimize = FALSE, iter = 2,
   col_type = col_type, percent_of_missing = percent_of_missing,
   col_no_miss = col_no_miss, col_miss = col_miss)

  # Check if all missing value was imputed
  sum(is.na(imp_data)) == 0
  # TRUE
}

Run the code above in your browser using DataLab