Learn R Programming

scrime (version 1.2.9)

knncatimpute: Missing Value Imputation with kNN

Description

Imputes missing values in a matrix composed of categorical variables using $k$ Nearest Neighbors.

Usage

knncatimpute(x, dist = NULL, nn = 3, weights = TRUE)

Arguments

x
a numeric matrix containing missing values. All non-missing values must be integers between 1 and $n_{cat}$, where $n_{cat}$ is the maximum number of levels the categorical variables in x can take. If the $k$ nearest observatio
dist
either a character string naming the distance measure or a distance matrix. If the former, dist must be either "smc", "cohen", or "pcc". If the latter, dist must be a symmetric mat
nn
an integer specifying $k$, i.e. the number of nearest neighbors, used in the imputation of the missing values.
weights
should weighted $k$NN be used to impute the missing values? If TRUE, the vote of each nearest neighbor is weighted by the reciprocal of its distance to the observation or variable when the missing values of this observation or varia

Value

  • A matrix of the same size as x in which all the missing values have been imputed.

References

Schwender, H. (2007). Statistical Analysis of Genotype and Gene Expression Data. Dissertation, Department of Statistics, University of Dortmund.

See Also

knncatimputeLarge, gknn, smc, pcc

Examples

Run this code
# Generate a data set consisting of 200 rows and 50 columns
# in which the values are integers between 1 and 3.
# Afterwards, remove 20 of the values randomly.

mat <- matrix(sample(3, 10000, TRUE), 200)
mat[sample(10000, 20)] <- NA

# Replace the missing values.

mat2 <- knncatimpute(mat)

# Replace the missing values using the 5 nearest neighbors
# and Cohen's Kappa.

mat3 <- knncatimpute(mat, nn = 5, dist = "cohen")

Run the code above in your browser using DataLab