
This function implements an extended version of multiplicative simple imputation (multRepl
function) to simultaneously deal with both zeros (i.e. data below detection limit, rounded zeros) and missing data in compositional data sets.
Note: zeros and missing data must be labelled using 0 and NA
respectively to use this function.
multReplus(X, dl = NULL, frac = 0.65, closure = NULL,
z.warning = 0.8, z.delete = TRUE, delta = NULL)
A data.frame
object containing the imputed compositional data set expressed in the original scale.
Compositional data set (matrix
or data.frame
class).
Numeric vector or matrix of detection limits/thresholds. These must be given on the same scale as X
. If NULL
the column minima are used as thresholds.
Fraction of the detection limit/threshold used for imputation (default = 0.65, expressed as a proportion).
Closure value used to add a residual part for imputation (see below).
Threshold used to identify individual rows or columns including an excess of zeros/unobserved values (to be specify in proportions, default z.warning=0.8
).
Logical value. If set to TRUE
, rows/columns identified by z.warning
are omitted in the imputed data set. Otherwise, the function stops in error when rows/columns are identified by z.warning
(default z.delete=TRUE
).
This argument has been deprecated and replaced by frac
(see package's NEWS for details).
The procedure firstly replaces missing data using the estimated geometric mean based on the observed values and then zeros using frac*dl
. The observed components are applied a multiplicative adjustment to preserve the multivariate compositional properties of the samples.
Note: negative values can be generated when unobserved components are a large portion of the composition, which is more likely for missing data (e.g in major chemical elements) and non-closed compositions. A workaround is to add a residual filling the gap up to the closure/total when possible. This is done internally when a value for closure
is specified (e.g. closure=10^6
if ppm or closure=100
if percentages). The residual is discarded after imputation.
See ?multRepl
for more details.
Martin-Fernandez JA, Barcelo-Vidal C, Pawlowsky-Glahn V. Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Mathematical Geology 2003; 35: 253-78.
Palarea-Albaladejo J, Martin-Fernandez JA. Values below detection limit in compositional chemical data. Analytica Chimica Acta 2013; 764: 32-43.
Palarea-Albaladejo J. and Martin-Fernandez JA. zCompositions -- R package for multivariate imputation of left-censored data under a compositional approach. Chemometrics and Intelligence Laboratory Systems 2015; 143: 85-96.
multRepl
, lrEMplus
, lrSVDplus
# Data set closed to 100 (percentages, common dl = 1%)
# (Note that zeros and missing in the same row are allowed)
X <- matrix(c(26.91,8.08,12.59,31.58,6.45,14.39,
39.73,41.42,0.00,NA,6.80,12.05,
NA,35.13,7.96,14.28,35.12,7.51,
10.85,46.40,31.89,10.86,0.00,0.00,
10.85,16.27,NA,9.16,19.57,44.15,
38.09,7.62,23.68,9.70,20.91,0.00,
NA,9.89,18.04,44.30,9.04,18.73,
44.41,15.04,7.95,0.00,10.82,21.78,
11.50,30.33,6.85,13.92,30.82,6.58,
19.04,42.59,0.00,38.37,0.00,0.00),byrow=TRUE,ncol=6)
X_multReplus <- multReplus(X,dl=rep(1,6))
# Multiple limits of detection by component
mdl <- matrix(0,ncol=6,nrow=10)
mdl[2,] <- rep(1,6)
mdl[4,] <- rep(0.75,6)
mdl[6,] <- rep(0.5,6)
mdl[8,] <- rep(0.5,6)
mdl[10,] <- c(0,0,1,0,0.8,0.7)
X_multReplus2 <- multReplus(X,dl=mdl)
# Non-closed compositional data set
data(LPdataZM) # (in ppm; 0 is nondetect and NA is missing data)
dl <- c(2,1,0,0,2,0,6,1,0.6,1,1,0,0,632,10) # limits of detection (0 for no limit)
LPdataZM2 <- subset(LPdataZM,select=-c(Cu,Ni,La)) # select a subset for illustration purposes
dl2 <- dl[-c(5,7,10)]
if (FALSE) {
LPdataZM2_multReplus <- multReplus(LPdataZM2,dl=dl2)
# Negative values generated (see e.g. K in sample #64)
}
# Workaround: use residual part to fill up the gap to 10^6 for imputation
LPdataZM2_multReplus <- multReplus(LPdataZM2,dl=dl2,closure=10^6)
Run the code above in your browser using DataLab