Learn R Programming

NADIA (version 0.4.2)

random_param_mice_search: Performing randomSearch for selecting the best method and correlation or fraction of features used to create a prediction matrix.

Description

This function perform random search and return values corresponding to best mean MIF (missing information fraction). Function is mainly used in autotune_mice but can be use separately.

Usage

random_param_mice_search(
  low_corr = 0,
  up_corr = 1,
  methods_random = c("pmm"),
  df,
  formula,
  no_numeric,
  iter,
  random.seed = 123,
  correlation = TRUE
)

Value

List with best correlation (or fraction ) at first place, best method at second, and results of every iteration at 3.

Arguments

low_corr

double between 0,1 default 0 lower boundry of correlation set.

up_corr

double between 0,1 default 1 upper boundary of correlation set. Both of these parameters work the same for a fraction of features.

methods_random

set of methods to chose. Default 'pmm'.

df

data frame to input.

formula

first product of formula_creating() funtion. For example formula_creating(...)[1]

no_numeric

second product of formula_creating() function.

iter

number of iteration for randomSearch.

random.seed

radnom seed.

correlation

If True correlation is using if Fales fraction of features. Default True.

Details

Function use Random Search Technik to found the best param for mice imputation. To evaluate the next iteration logistic regression or linear regression (depending on available features) are used. Model is build using a formula from formula_creating function. As metric MIF (missing information fraction) is used. Params combination with lowest (best) MIF is chosen. Even if a correlation is set at False correlation it's still used to select the best features. That main problem with calculating correlation between categorical columns is still important.