Learn R Programming

ck37r (version 1.0.0)

impute_missing_values: Impute missing values in a dataframe and add missingness indicators.

Description

Impute missing values, using knn by default or alternatively median-impute numerics, mode-impute factors. Add missingness indicators.

Usage

impute_missing_values(data, type = "standard", add_indicators = T,
  prefix = "miss_", skip_vars = c(), verbose = F)

Arguments

data

Dataframe or matrix.

type

"knn" or "standard" (median/mode). NOTE: knn will result in the data being centered and scaled!

add_indicators

Add a series of missingness indicators.

prefix

String to add at the beginning of the name of each missingness indicator.

skip_vars

List of variable names to exclude from the imputation.

verbose

If True display extra information during execution.

Value

List with the following elements:

  • $data - imputed dataset.

  • $impute_info - if knn, caret preprocess element for imputing test data.

  • $impute_values - if standard, list of imputation values for each variable.

See Also

missingness_indicators preProcess

Examples

Run this code

# Load a test dataset.
data(PimaIndiansDiabetes2, package = "mlbench")

# Check for missing values.
colSums(is.na(PimaIndiansDiabetes2))

# Impute missing data and add missingness indicators.
# Don't impute the outcome though.
result = impute_missing_values(PimaIndiansDiabetes2, skip_vars = "diabetes")

# Confirm we have no missing data.
colSums(is.na(result$data))


#############
# K-nearest neighbors imputation

result2 = impute_missing_values(PimaIndiansDiabetes2, type = "knn", skip_vars = "diabetes")

# Confirm we have no missing data.
colSums(is.na(result2$data))

Run the code above in your browser using DataLab