Learn R Programming

zCompositions (version 1.0.0)

lrEM: Logratio EM algorithm

Description

This function implements ordinary and robust Expectation-Maximisation algorithms to replace left-censored values (e.g. values below detection limit, rounded zeros) in compositional data sets.

Usage

lrEM(X, label = NULL, dl = NULL, rob = FALSE,
        ini.cov = c("complete.obs", "multRepl"), delta = 0.65,
        tolerance = 0.0001, max.iter = 50, rlm.maxit = 150, suppress.print = FALSE)

Arguments

X
Compositional data set (matrix or data.frame class).
label
Unique label (numeric or character) used to denote unobserved left-censored values in X.
dl
Numeric vector of detection limits/thresholds (one per component/column, use e.g. 0 if no threshold for a particular one). These must be given on the same scale as X.
rob
Logical value. FALSE provides maximum-likelihood estimates of model parameters (default), TRUE provides robust parameter estimates.
ini.cov
Initial estimation of the log-ratio covariance matrix. It can be based on either complete observations ("complete.obs", default) or multiplicative simple replacement ("multRepl").
delta
If ini.cov="multRepl", delta parameter for initial multiplicative replacement (multRepl) in proportions (default = 0.65).
tolerance
Convergence criterion for the EM algorithm (default = 0.0001).
max.iter
Maximum number of iterations for the EM algorithm (default = 50).
rlm.maxit
If rob=TRUE, maximum number of iterations for the embedded robust regression estimation (default = 150; see rlm in MASS package for details).
suppress.print
Suppress printing of number of iterations required to converge (suppress.print=FALSE, default).

Value

  • A data.frame object containing the replaced compositional data set. The number of iterations required for convergence is also printed (this can be suppressed by setting suppress.print=TRUE).

Details

lrEM produces a replaced data set on the same scale as the input data set. If X is not closed to a constant sum, then the results are adjusted to provide a compositionally equivalent data set, expressed in the original scale, which leaves the absolute values of the observed components unaltered.

Under maximum likelihood estimation (default, rob=FALSE), a correction factor based on the residual covariance obtained by censored regression is applied for the correct estimation of the conditional covariance matrix in the maximisation step of the EM algorithm. This is required in order to obtain the conditional expectation of the sum of cross-products between two components in the case that both involve imputed values. Note that the procedure is based on the oblique additive log-ratio (alr) transformation to simplify calculations and alleviates computational burden. Nonetheless, the same results would be obtained using an isometric log-ratio transformation (ilr).

Under robust estimation (rob=TRUE), the algorithm requires ilr transformations in order to satisfy requirements for robust estimation methods (MM-estimation by default, see rlm function for more details). In the unlikely case of censoring patterns involving samples containing only one observed component, these are imputed by multiplicative simple replacement and a warning message identifying them is printed. An initial estimation of nondetects is required to get the algorithm started. This can be based on either the subset of fully observed cases (ini.cov="complete.obs") or a multiplicative simple replacement of all nondetects in the data set (ini.cov="multRepl"). Note that the robust regression method involved includes random elements which can, occasionally, give rise to NaN values getting the routine execution halted. If this happened, we suggest to simply re-run the function once again.

References

Martin-Fernandez, J.A., Hron, K., Templ, M., Filzmoser, P., Palarea-Albaladejo, J. Model-based replacement of rounded zeros in compositional data: classical and robust approaches. Computational Statistics & Data Analysis 2012; 56: 2688-2704.

Palarea-Albaladejo J, Martin-Fernandez JA, Gomez-Garcia J. A parametric approach for dealing with compositional rounded zeros. Mathematical Geology 2007; 39: 625-45.

Palarea-Albaladejo J, Martin-Fernandez JA. A modified EM alr-algorithm for replacing rounded zeros in compositional data sets. Computers & Geosciences 2008; 34(8): 902-917.

Palarea-Albaladejo J, Martin-Fernandez JA. Values below detection limit in compositional chemical data. Analytica Chimica Acta 2013; 764: 32-43. DOI: http://dx.doi.org/10.1016/j.aca.2012.12.029.

See Also

zPatterns, multRepl, multLN, lrDA, cmultRepl

Examples

Run this code
# Data set closed to 100 (percentages, common dl = 1%)
X <- matrix(c(26.91,8.08,12.59,31.58,6.45,14.39,
              39.73,26.20,0.00,15.22,6.80,12.05,
              10.76,31.36,7.10,12.74,31.34,6.70,
              10.85,46.40,31.89,10.86,0.00,0.00,
              7.57,11.35,30.24,6.39,13.65,30.80,
              38.09,7.62,23.68,9.70,20.91,0.00,
              27.67,7.15,13.05,32.04,6.54,13.55,
              44.41,15.04,7.95,0.00,10.82,21.78,
              11.50,30.33,6.85,13.92,30.82,6.58,
              19.04,42.59,0.00,38.37,0.00,0.00),byrow=TRUE,ncol=6)
              
X_lrEM <- lrEM(X,label=0,dl=rep(1,6),ini.cov="multRepl")
X_roblrEM <- lrEM(X,label=0,dl=rep(1,6),ini.cov="multRepl",rob=TRUE,tolerance=0.001)

# Non-closed compositional data set
data(LPdata) # data (ppm/micrograms per gram)
dl <- c(2,1,0,0,2,0,6,1,0.6,1,1,0,0,632,10) # limits of detection (0 for no limit)
LPdata2 <- subset(LPdata,select=-c(Cu,Ni,La))  # select a subset for illustration purposes
dl2 <- dl[-c(5,7,10)]

LPdata_lrEM <- lrEM(LPdata2,label=0,dl=dl2)
LPdata_roblrEM <- lrEM(LPdata2,label=0,dl=dl2,rob=TRUE,tolerance=0.005)

Run the code above in your browser using DataLab