Usage
kNN(data, variable = colnames(data), metric = NULL, k = 5, dist_var = colnames(data), weights = NULL, numFun = median, catFun = maxCat, makeNA = NULL, NAcond = NULL, impNA = TRUE, donorcond = NULL, mixed = vector(), mixed.constant = NULL, trace = FALSE, imp_var = TRUE, imp_suffix = "imp", addRandom = FALSE, useImputedDist = TRUE, weightDist = FALSE)
Arguments
variable
variables where missing values should be imputed
metric
metric to be used for calculating the distances between
k
number of Nearest Neighbours used
dist_var
names or variables to be used for distance calculation
weights
weights for the variables for distance calculation
numFun
function for aggregating the k Nearest Neighbours in the case
of a numerical variable
catFun
function for aggregating the k Nearest Neighbours in the case
of a categorical variable
makeNA
list of length equal to the number of variables, with values, that should be converted to NA for each variable
NAcond
list of length equal to the number of variables, with a condition for imputing a NA
impNA
TRUE/FALSE whether NA should be imputed
donorcond
condition for the donors e.g. ">5"
mixed
names of mixed variables
mixed.constant
vector with length equal to the number of
semi-continuous variables specifying the point of the semi-continuous
distribution with non-zero probability
trace
TRUE/FALSE if additional information about the imputation
process should be printed
imp_var
TRUE/FALSE if a TRUE/FALSE variables for each imputed
variable should be created show the imputation status
imp_suffix
suffix for the TRUE/FALSE variables showing the imputation
status
addRandom
TRUE/FALSE if an additional random variable should be added
for distance calculation
useImputedDist
TRUE/FALSE if an imputed value should be used for distance calculation for imputing another variable.
Be aware that this results in a dependency on the ordering of the variables.
weightDist
TRUE/FALSE if the distances of the k nearest neighbours should be used as weights in the
aggregation step