rminer (version 1.4.6)

imputation: Missing data imputation (e.g. substitution by value or hotdeck method).

Description

Missing data imputation (e.g. substitution by value or hotdeck method).

Usage

imputation(imethod = "value", D, Attribute = NULL, Missing = NA, Value = 1)

Arguments

imethod

imputation method type:

  • value -- substitutes missing data by Value (with single element or several elements);

  • hotdeck -- searches first the most similar example (i.e. using a k-nearest neighbor method -- knn) in the dataset and replaces the missing data by the value found in such example;

D

dataset with missing data (data.frame)

Attribute

if NULL then all attributes (data columns) with missing data are replaced. Else, Attribute is the attribute number (numeric) or name (character).

Missing

missing data symbol

Value

the substitution value (if imethod=value) or number of neighbors (k of knn).

Value

A data.frame without missing data.

Details

Check the references.

References

  • M. Brown and J. Kros. Data mining and the impact of missing data. In Industrial Management & Data Systems, 103(8):611-621, 2003.

  • This tutorial shows additional code examples: P. Cortez. A tutorial on using the rminer R package for data mining tasks. Teaching Report, Department of Information Systems, ALGORITMI Research Centre, Engineering School, University of Minho, Guimaraes, Portugal, July 2015. http://hdl.handle.net/1822/36210

See Also

fit and delevels.

Examples

Run this code
# NOT RUN {
d=matrix(ncol=5,nrow=5)
d[1,]=c(5,4,3,2,1)
d[2,]=c(4,3,4,3,4)
d[3,]=c(1,1,1,1,1)
d[4,]=c(4,NA,3,4,4)
d[5,]=c(5,NA,NA,2,1)
d=data.frame(d); d[,3]=factor(d[,3])
print(d)
print(imputation("value",d,3,Value="3"))
print(imputation("value",d,2,Value=median(na.omit(d[,2]))))
print(imputation("value",d,2,Value=c(1,2)))
print(imputation("hotdeck",d,"X2",Value=1))
print(imputation("hotdeck",d,Value=1))

# }
# NOT RUN {
# hotdeck 1-nearest neighbor substitution on a real dataset:
require(kknn)
d=read.table(
   file="http://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data",
   sep=",",na.strings="?")
print(summary(d))
d2=imputation("hotdeck",d,Value=1)
print(summary(d2))
par(mfrow=c(2,1))
hist(d$V26)
hist(d2$V26)
par(mfrow=c(1,1)) # reset mfrow
# }
# NOT RUN {
# }

Run the code above in your browser using DataCamp Workspace