Learn R Programming

NADIA (version 0.4.1)

missMDA_MFA: Perform imputation using MFA algorithm.

Description

Function use MFA (Multiple Factor Analysis) to impute missing data.

Usage

missMDA_MFA(
  df,
  col_type = NULL,
  percent_of_missing = NULL,
  random.seed = NULL,
  ncp = 2,
  col_0_1 = FALSE,
  maxiter = 1000,
  coeff.ridge = 1,
  threshold = 1e-06,
  method = "Regularized",
  out_file = NULL,
  imp_data = FALSE
)

Value

Return one data.frame with imputed values.

Arguments

df

data.frame. Df to impute with column names and without target column.

col_type

character vector. Vector containing column type names.

percent_of_missing

numeric vector. Vector contatining percent of missing data in columns for example c(0,1,0,0,11.3,..)

random.seed

integer, by default radndom.seed = NULL implies that missing values are initially imputed by the mean of each variable. Other values leads to a random initialization

ncp

Number of dimensions used by algorithm. Default 2.

col_0_1

Decaid if add bonus column informing where imputation been done. 0 - value was in dataset, 1 - value was imputed. Default False. (Works only for returning one dataset).

maxiter

maximal number of iteration in algorithm.

coeff.ridge

Value use in Regularized method.

threshold

for convergence.

method

used in imputation algorithm.

out_file

Output log file location if file already exists log message will be added. If NULL no log will be produced.

imp_data

If True data abute imputation requaierd for missMDA.reuse its return.

Author

Julie Josse, Francois Husson (2016) tools:::Rd_expr_doi("10.18637/jss.v070.i01")

Details

Groups are created using the original column order and taking as much variable to one group as possible. MFA requires selecting group type but numeric types can only be set as 'c' - centered and 's' - scale to unit variance. It's impossible to provide these conditions so numeric type is always set as 's'. Because of that imputation can depend from column order. In this function, no param is set automatically but if selected ncp don't work function will try use ncp=1.

References

Julie Josse, Francois Husson (2016). missMDA: A Package for Handling Missing Values in Multivariate Data Analysis. Journal of Statistical Software, 70(1), 1-31. doi:10.18637/jss.v070.i01

Examples

Run this code
{
  raw_data <- data.frame(
    a = as.factor(sample(c("red", "yellow", "blue", NA), 1000, replace = TRUE)),
    b = as.integer(1:1000),
    c = as.factor(sample(c("YES", "NO", NA), 1000, replace = TRUE)),
    d = runif(1000, 1, 10),
    e = as.factor(sample(c("YES", "NO"), 1000, replace = TRUE)),
    f = as.factor(sample(c("male", "female", "trans", "other", NA), 1000, replace = TRUE)))

  # Prepering col_type
  col_type <- c("factor", "integer", "factor", "numeric", "factor", "factor")

  percent_of_missing <- 1:6
  for (i in percent_of_missing) {
    percent_of_missing[i] <- 100 * (sum(is.na(raw_data[, i])) / nrow(raw_data))
  }


  imp_data <- missMDA_MFA(raw_data, col_type, percent_of_missing)

  # Check if all missing value was imputed
  sum(is.na(imp_data)) == 0
  # TRUE
}

Run the code above in your browser using DataLab