This function estimates missing values sequentially from the units that has least missing rate, using weighted mean of k nearest neighbors.
Usage
seqKNNimp(data, k = 10)
Value
A dataframe with imputed values.
Arguments
data
A data frame with the data set.
k
The number of nearest neighbours to use (defaults to 10).
Author
Ki-Yeol Kim and Gwan-Su Yi
Details
The function separates the dataset into an incomplete set with missing values and into a complete set without missing values.
The values in an incomplete set are imputed in the order of the number of missing values. A missing value is filled by the
weighted mean value of a corresponding column of the nearest neighbour units in the complete set. Once all missing values for
a given unit are imputed, the unit is moved into the complete set and used for the imputation of the rest of units in the
incomplete set. In this process, all missing values for one unit can be imputed simultaneously from the selected neighbour
units in the complete set. This reduces execution time from previously developed KNN method that selects nearest neighbours
for each imputation.
References
Ki-Yeol Kim, Byoung-Jin Kim, Gwan-Su Yi (2004.Oct.26) "Reuse of imputed data in microarray analysis increases imputation efficiency", BMC Bioinformatics 5:160.