knnImp:
Fill in NA values with the values of the nearest neighbours
Description
Function that fills in all NA values using the k Nearest
Neighbours of each case with NA values. It uses the median/most
frequent value within the neighbours to fill in the NAs.
Usage
knnImp(data, k = 10, scale = TRUE, distData = NULL)
Arguments
data
A data frame with the data set
k
The number of nearest neighbours to use (defaults to 10)
scale
Boolean setting if the data should be scale before finding the
nearest neighbours (defaults to TRUE)
distData
Optionally you may sepecify here a data frame containing the data set
that should be used to find the neighbours. This is usefull when
filling in NA values on a test set, where you should use only
information from the training set. This defaults to NULL, which means
that the neighbours will be searched in data
Value
A data frame without NA values
Details
This function uses the k-nearest neighbours to fill in the unknown (NA)
values in a data set. For each case with any NA value it will search for
its k most similar cases and use the values of these cases to fill in
the unknowns.
The function will use either the median (in case of numeric variables)
or the most frequent value (in case of factors), of the neighbours to
fill in the NAs.
References
Torgo, L. (2014) An Infra-Structure for Performance
Estimation and Experimental Comparison of Predictive Models in R. arXiv:1412.0436 [cs.MS]
http://arxiv.org/abs/1412.0436