Learn R Programming

dlookr (version 0.3.9)

imputate_na: Imputate Missing values

Description

Missing values are imputated with some representative values and statistical methods.

Usage

imputate_na(.data, xvar, yvar, method, seed, print_flag)

Arguments

.data

a data.frame or a tbl_df.

xvar

variable name to replace missing value.

yvar

target variable.

method

method of missing values imputation.

seed

integer. the random seed used in mice. only used "mice" method.

print_flag

logical. If TRUE, mice will print history on console. Use print_flag=FALSE for silent computation. Used only when method is "mice".

Value

An object of imputation class. Attributes of imputation class is as follows.

  • var_type : the data type of predictor to replace missing value.

  • method : method of missing value imputation.

    • predictor is numerical variable

      • "mean" : arithmetic mean

      • "median" : median

      • "mode" : mode

      • "knn" : K-nearest neighbors

      • "rpart" : Recursive Partitioning and Regression Trees

      • "mice" : Multivariate Imputation by Chained Equations

    • predictor is categorical variable

      • "mode" : mode

      • "rpart" : Recursive Partitioning and Regression Trees

      • "mice" : Multivariate Imputation by Chained Equations

  • na_pos : position of missing value in predictor.

  • seed : the random seed used in mice. only used "mice" method.

  • type : "missing values". type of imputation.

Details

imputate_na () creates an imputation class. The `imputation` class includes missing value position, imputated value, and method of missing value imputation, etc. The `imputation` class compares the imputated value with the original value to help determine whether the imputated value is used in the analysis.

See vignette("transformation") for an introduction to these concepts.

See Also

imputate_outlier.

Examples

Run this code
# NOT RUN {
# Generate data for the example
carseats <- ISLR::Carseats
carseats[sample(seq(NROW(carseats)), 20), "Income"] <- NA
carseats[sample(seq(NROW(carseats)), 5), "Urban"] <- NA

# Replace the missing value of the Income variable with median
imputate_na(carseats, Income, method = "median")

# Replace the missing value of the Income variable with rpart
# The target variable is US.
imputate_na(carseats, Income, US, method = "rpart")

# Replace the missing value of the Urban variable with median
imputate_na(carseats, Urban, method = "mode")

# Replace the missing value of the Urban variable with mice
# The target variable is US.
imputate_na(carseats, Urban, US, method = "mice")

## using dplyr -------------------------------------
library(dplyr)

# The mean before and after the imputation of the Income variable
carseats %>%
  mutate(Income_imp = imputate_na(carseats, Income, US, method = "knn")) %>%
  group_by(US) %>%
  summarise(orig = mean(Income, na.rm = TRUE),
    imputation = mean(Income_imp))

# If the variable of interest is a numarical variable
income <- imputate_na(carseats, Income, US, method = "rpart")
income
summary(income)
plot(income)

# If the variable of interest is a categorical variable
urban <- imputate_na(carseats, Urban, US, method = "mice")
urban
summary(urban)
plot(urban)
# }

Run the code above in your browser using DataLab