imputate_na: Imputate Missing values

Description

Missing values are imputated with some representative values and statistical methods.

Usage

imputate_na(.data, xvar, yvar, method, seed, print_flag, no_attrs)

Arguments

.data

a data.frame or a tbl_df.

xvar

variable name to replace missing value.

yvar

target variable.

method

method of missing values imputation.

seed

integer. the random seed used in mice. only used "mice" method.

print_flag

logical. If TRUE, mice will print history on console. Use print_flag=FALSE for silent computation. Used only when method is "mice".

no_attrs

logical. If TRUE, return numerical variable or categorical variable. else If FALSE, imputation class.

Value

An object of imputation class. or numerical variable or categorical variable. if no_attrs is FALSE then return imputation class, else no_attrs is TRUE then return numerical vector or factor. Attributes of imputation class is as follows.

var_type : the data type of predictor to replace missing value.
method : method of missing value imputation.
- predictor is numerical variable
  - "mean" : arithmetic mean
  - "median" : median
  - "mode" : mode
  - "knn" : K-nearest neighbors
  - "rpart" : Recursive Partitioning and Regression Trees
  - "mice" : Multivariate Imputation by Chained Equations
- predictor is categorical variable
  - "mode" : mode
  - "rpart" : Recursive Partitioning and Regression Trees
  - "mice" : Multivariate Imputation by Chained Equations
na_pos : position of missing value in predictor.
seed : the random seed used in mice. only used "mice" method.
type : "missing values". type of imputation.
message : a message tells you if the result was successful.
success : Whether the imputation was successful.

Details

imputate_na () creates an imputation class. The `imputation` class includes missing value position, imputated value, and method of missing value imputation, etc. The `imputation` class compares the imputated value with the original value to help determine whether the imputated value is used in the analysis.

See vignette("transformation") for an introduction to these concepts.

Examples

Run this code

# NOT RUN {
# Generate data for the example
carseats <- ISLR::Carseats
carseats[sample(seq(NROW(carseats)), 20), "Income"] <- NA
carseats[sample(seq(NROW(carseats)), 5), "Urban"] <- NA

# Replace the missing value of the Income variable with median
imputate_na(carseats, Income, method = "median")

# Replace the missing value of the Income variable with rpart
# The target variable is US.
imputate_na(carseats, Income, US, method = "rpart")

# Replace the missing value of the Urban variable with median
imputate_na(carseats, Urban, method = "mode")

# Replace the missing value of the Urban variable with mice
# The target variable is US.
imputate_na(carseats, Urban, US, method = "mice")

## using dplyr -------------------------------------
library(dplyr)

# The mean before and after the imputation of the Income variable
carseats %>%
  mutate(Income_imp = imputate_na(carseats, Income, US, method = "knn", no_attrs = TRUE)) %>%
  group_by(US) %>%
  summarise(orig = mean(Income, na.rm = TRUE),
    imputation = mean(Income_imp))

# If the variable of interest is a numarical variable
income <- imputate_na(carseats, Income, US, method = "rpart")
income
summary(income)
plot(income)

# If the variable of interest is a categorical variable
urban <- imputate_na(carseats, Urban, US, method = "mice")
urban
summary(urban)
plot(urban)
# }