nroImpute: Impute missing values

Description

Find nearest neighbors by Euclidean distance and impute missing values.

Usage

nroImpute(data, subsample = 500, standard = TRUE)

Arguments

data

A matrix or a data frame.

subsample

Maximum number of matchings to test per imputed row.

standard

If TRUE, the scales of variables are standardized for processing.

Value

A copy of the input argument where missing values have been imputed.

Details

Non-numeric columns are excluded from processing and returned unaltered.

If subsample is less than the number of rows, an equivalent number of randomly picked rows is selected to find the nearest neighbor.

Examples

Run this code

# NOT RUN {
# Import data.
fname <- system.file("extdata", "finndiane.txt", package = "Numero")
dataset <- read.delim(file = fname)

# Convert identities to strings (produces a warning later).
ds <- dataset
ds$INDEX <- paste("K", ds$INDEX, sep=".")

# Introduce missing values to cholesterol.
missing <- seq(from = 1, to = nrow(ds), length.out = 40)
missing <- unique(round(missing))
ds$CHOL[missing] <- NA

# Impute missing values with and without standardization.
ds.std <- nroImpute(data = ds, standard = TRUE)
ds.orig <- nroImpute(data = ds, standard = FALSE)

# Compare against "true" cholesterol values.
rho.std <- cor(ds.std$CHOL[missing], dataset$CHOL[missing])
rho.orig <- cor(ds.orig$CHOL[missing], dataset$CHOL[missing])
cat("Correlation, standard = TRUE:  ", rho.std, "\n", sep="")
cat("Correlation, standard = FALSE: ", rho.orig, "\n", sep="")
# }

Run the code above in your browser using DataLab