- data
a numeric matrix consisting of integers between 1 and \(n_{cat}\),
where \(n_{cat}\) is maximum number of levels the categorical variables
can take. If mat.na is specified, data is assumed to contain only
non-missing data, and the rows of data are used to impute the missing values
in mat.na. Otherwise, data is also allowed to contain missing values,
and the missing values in the rows of data are imputed by employing the rows of
data showing no missing values.
Each row of data represents one of the objects that should be used to identify the
\(k\) nearest neighbors, i.e.\ if the \(k\) nearest variables should be used to
replace the missing values, then each row must represent one of the variables. If the
\(k\) nearest observations should be used to impute the missing values, then each
row must correspond to one of the observations.
- mat.na
a numeric matrix containing missing values. Must have the same number of
columns as data. All non-missing values must be integers between 1 and
\(n_{cat}\). If NULL, data is assumed to also contain the
rows with missing values.
- fac
a numeric or character vector of length nrow{data} specifying the values of
a factor used to split data into subsets. If, e.g., the values of fac
are given by the chromosomes to which the SNPs represented by the rows of data
belong, then \(k\) nearest neighbors is applied chromosomewise to the missing values
in mat.na (or data). If NULL, no such splitting is done. Must be
specified, if fac.na is specified.
- fac.na
a numeric or character vector of length nrow{mat.na} specifying the values
of a factor by which mat.na is split into subsets. Each possible value of fac.na
must be at least nn times in fac. Must be specified, if fac and mat.na
is specified. If both fac and fac.na are NULL, then no splitting is done.
- nn
an integer specifying \(k\), i.e.\ the number of nearest neighbors, used to impute
the missing values.
- distance
character string naming the distance measure used in \(k\) Nearest Neighbors.
Must be either "smc" (default), "cohen", "snp1norm" (which denotes the
Manhattan distance for SNPs), or "pcc".
- n.num
an integer giving the number of rows of mat.na considered simultaneously
when replacing the missing values in mat.na.
- use.weights
should weighted \(k\) nearest neighbors be used to impute the missing values?
If TRUE, the votes of the nearest neighbors are weighted by the reciprocal of their
distances to the variable (or observation) whose missing values are imputed.
- verbose
should more information about the progress of the imputation be printed?