This function offers different methods for the imputation of missing values in compositional data. Missing values are initialized with proper values. Then iterative algorithms try to find better estimations for the former missing values.
impCoda(x, maxit = 10, eps = 0.5, method = "ltsReg",
closed = FALSE, init = "KNN", k = 5, dl = rep(0.05, ncol(x)),
noise = 0.1, bruteforce = FALSE)
data frame or matrix
maximum number of iterations
convergence criteria
imputation method
imputation of transformed data (using ilr transformation) or
in the original space (closed
equals TRUE)
method for initializing missing values
number of nearest neighbors (if init $==$ “KNN”)
detection limit(s), only important for the imputation of rounded zeros
amount of adding random noise to predictors after convergency
if TRUE, imputations over dl are set to dl. If FALSE, truncated (Tobit) regression is applied.
Original data frame or matrix
Imputed data
Sum of the Aitchison distances from the present and previous iteration
Number of iterations
Maximum number of iterations
Amount of imputed values
Index of the missing values in the data
eps: The algorithm is finished as soon as the imputed values stabilize, i.e. until the sum of Aitchison distances from the present and previous iteration changes only marginally (eps).\
method: Several different methods can be chosen, such as ‘ltsReg’:
least trimmed squares regression is used within the iterative procedure.
‘lm’: least squares regression is used within the iterative
procedure. ‘classical’: principal component analysis is used within
the iterative procedure. ‘ltsReg2’: least trimmed squares regression
is used within the iterative procedure. The imputated values are perturbed
in the direction of the predictor by values drawn form a normal distribution
with mean and standard deviation related to the corresponding residuals and
multiplied by noise
.
Hron, K., Templ, M., Filzmoser, P. (2010) Imputation of missing values for compositional data using classical and robust methods Computational Statistics and Data Analysis, 54 (12), 3095-3107.
# NOT RUN {
data(expenditures)
x <- expenditures
x[1,3]
x[1,3] <- NA
xi <- impCoda(x)$xImp
xi[1,3]
s1 <- sum(x[1,-3])
impS <- sum(xi[1,-3])
xi[,3] * s1/impS
# }
Run the code above in your browser using DataCamp Workspace