Learn R Programming

NADIA (version 0.4.1)

autotune_missRanger: Perform imputation using missRenger form missRegnger package.

Description

Function use missRenger package for data imputation. Function use OBBerror (more in missForest documentation) to perform random search.

Usage

autotune_missRanger(
  df,
  percent_of_missing = NULL,
  maxiter = 10,
  random.seed = 123,
  mtry = NULL,
  num.trees = 500,
  verbose = FALSE,
  col_0_1 = FALSE,
  out_file = NULL,
  pmm.k = 5,
  optimize = TRUE,
  iter = 10
)

Value

Return data.frame with imputed values.

Arguments

df

data.frame. Df to impute with column names and without target column.

percent_of_missing

numeric vector. Vector contatining percent of missing data in columns for example c(0,1,0,0,11.3,..)

maxiter

maximum number of iteration for missRanger algorithm

random.seed

random seed use in imputation

mtry

sample fraction use by missRanger. This param isn't optimized automatically. If NULL default value from ranger package will be used.

num.trees

number of trees. If optimize == TRUE. Param set seq(10,num.trees,iter) will be used.

verbose

If FALSE function doesn't print on console.

col_0_1

decide if add bonus column informing where imputation been done. 0 - value was in dataset, 1 - value was imputed. Default False.

out_file

Output log file location if file already exists log message will be added. If NULL no log will be produced.

pmm.k

Number of candidate non-missing values to sample from in the predictive meanmatching step. 0 to avoid this step. If optimize == TRUE param set sample(1:pmm.k,iter) will be used. If pmm.k==0 missRanger == missForest.

optimize

If TRUE inside optimization will be performed.

iter

Number of iteration for a random search.

Author

Michael Mayer (2019).

References

Michael Mayer (2019). missRanger: Fast Imputation of Missing Values. R package version 2.1.0. https://CRAN.R-project.org/package=missRanger

Examples

Run this code
# \donttest{
  raw_data <- data.frame(
    a = as.factor(sample(c("red", "yellow", "blue", NA), 1000, replace = TRUE)),
    b = as.integer(1:1000),
    c = as.factor(sample(c("YES", "NO", NA), 1000, replace = TRUE)),
    d = runif(1000, 1, 10),
    e = as.factor(sample(c("YES", "NO"), 1000, replace = TRUE)),
    f = as.factor(sample(c("male", "female", "trans", "other", NA), 1000, replace = TRUE)))

  # Prepering col_type
  col_type <- c("factor", "integer", "factor", "numeric", "factor", "factor")

  percent_of_missing <- 1:6
  for (i in percent_of_missing) {
    percent_of_missing[i] <- 100 * (sum(is.na(raw_data[, i])) / nrow(raw_data))
  }


  imp_data <- autotune_missRanger(raw_data[1:100,], percent_of_missing, optimize = FALSE)

  # Check if all missing value was imputed
  sum(is.na(imp_data)) == 0
  # TRUE
# }

Run the code above in your browser using DataLab