robCompositions (version 2.1.0)

impKNNa: Imputation of missing values in compositional data using knn methods

Description

This function offers several k-nearest neighbor methods for the imputation of missing values in compositional data.

Usage

impKNNa(x, method = "knn", k = 3, metric = "Aitchison",
  agg = "median", primitive = FALSE, normknn = TRUE, das = FALSE,
  adj = "median")

Arguments

x

data frame or matrix

method

method (at the moment, only “knn” can be used)

k

number of nearest neighbors chosen for imputation

metric

“Aichison” or “Euclidean”

agg

“median” or “mean”, for the aggregation of the nearest neighbors

primitive

if TRUE, a more enhanced search for the $k$-nearest neighbors is obtained (see details)

normknn

An adjustment of the imputed values is performed if TRUE

das

depricated. if TRUE, the definition of the Aitchison distance, based on simple logratios of the compositional part, is used (Aitchison, 2000) to calculate distances between observations. if FALSE, a version using the clr transformation is used.

adj

either ‘median’ (default) or ‘sum’ can be chosen for the adjustment of the nearest neighbors, see Hron et al., 2010.

Value

xOrig

Original data frame or matrix

xImp

Imputed data

w

Amount of imputed values

wind

Index of the missing values in the data

metric

Metric used

Details

The Aitchison metric should be chosen when dealing with compositional data, the Euclidean metric otherwise.

If primitive \(==\) FALSE, a sequential search for the \(k\)-nearest neighbors is applied for every missing value where all information corresponding to the non-missing cells plus the information in the variable to be imputed plus some additional information is available. If primitive \(==\) TRUE, a search of the \(k\)-nearest neighbors among observations is applied where in addition to the variable to be imputed any further cells are non-missing.

If normknn is TRUE (prefered option) the imputed cells from a nearest neighbor method are adjusted with special adjustment factors (more details can be found online (see the references)).

References

Aitchison, J., Barcelo-Vidal, C., Martin-Fernandez, J.A., Pawlowsky-Glahn, V. (2000) Logratio analysis and compositional distance, Mathematical Geology, 32(3), 271-275.

Hron, K., Templ, M., Filzmoser, P. (2010) Imputation of missing values for compositional data using classical and robust methods Computational Statistics and Data Analysis, 54 (12), 3095-3107.

See Also

impCoda

Examples

Run this code
# NOT RUN {
data(expenditures)
x <- expenditures
x[1,3]
x[1,3] <- NA
xi <- impKNNa(x)$xImp
xi[1,3]

# }

Run the code above in your browser using DataCamp Workspace