lrDA: Logratio DA algorithm

Description

This function implements a simulation-based Data Augmentation (DA) algorithm to replace left-censored values (e.g. values below detection limit, rounded zeros) in compositional data sets. Multiple imputation estimates can be also obtained from the output.

Usage

lrDA(X, label = NULL, dl = NULL,
        ini.cov=c("lrEM", "complete.obs", "multRepl"),
        delta = 0.65, n.iters = 1000, m = 1)

Arguments

Compositional data set (matrix or data.frame class).

label

Unique label (numeric or character) used to denote unobserved left-censored values in X.

Numeric vector of detection limits/thresholds (one per component/column, use e.g. 0 if no threshold for a particular one). These must be given on the same scale as X.

ini.cov

Initial estimation of the logratio-covariance matrix. It can be based on lrEM estimation ("lrEM", default), complete observations ("complete.obs") or multiplicative simple replacement ("multRepl").

delta

If ini.cov="multRepl", delta parameter for initial multiplicative replacement (multRepl) in proportions (default = 0.65).

n.iters

Number of iterations for the DA algorithm (default = 1000).

Number of multiple imputations (default = 1).

Value

A data.frame object containing the replaced compositional data set.

Details

lrDA produces a replaced data set on the same scale as the input data set. If X is not closed to a constant sum, then the results are adjusted to provide a compositionally equivalent data set, expressed in the original scale, which leaves the absolute values of the observed components unaltered.

Plausible values for the censored parts are simulated from their posterior predictive distributions after convergence of the DA iterative process to its steady state. The common conjugate normal inverted-Wishart distribution with non-informative prior has been assumed for the model parameters. Under this setting, convergence is expected to be fast (n.iters set to 1000 by default). Besides, considering EM parameter estimates as initial point for the DA algorithm (ini.cov="lrEM") assures faster convergence by starting near the centre of the posterior distribution. Note that the procedure is based on the oblique additive log-ratio (alr) transformation to simplify calculations and alleviates computational burden.

By setting m greater than 1, the procedure also allows for multiple imputations of the censored values drawn at regular intervals from the simulated single Markov chain generated by the DA iterations after convergence. In this case, in addition to the burn-in period for convergence, n.iters determines the gap, large enough to prevent from correlated imputations, between successive imputations. The total number of iterations is then n.iters*m. The replaced data set results from averaging the m imputations in accordance with multiple imputation theory.

References

Palarea-Albaladejo J, Martin-Fernandez JA, Olea, RA. A bootstrap estimation scheme for chemical compositional data with nondetects. Journal of Chemometrics 2014 (to appear).

Examples

Run this code

# Data set closed to 100 (percentages, common dl = 1%)
X <- matrix(c(26.91,8.08,12.59,31.58,6.45,14.39,
              39.73,26.20,0.00,15.22,6.80,12.05,
              10.76,31.36,7.10,12.74,31.34,6.70,
              10.85,46.40,31.89,10.86,0.00,0.00,
              7.57,11.35,30.24,6.39,13.65,30.80,
              38.09,7.62,23.68,9.70,20.91,0.00,
              27.67,7.15,13.05,32.04,6.54,13.55,
              44.41,15.04,7.95,0.00,10.82,21.78,
              11.50,30.33,6.85,13.92,30.82,6.58,
              19.04,42.59,0.00,38.37,0.00,0.00),byrow=TRUE,ncol=6)

# Replacement by single simulated values
X_lrDA <- lrDA(X,label=0,dl=rep(1,6),ini.cov="multRepl",n.iters=150)

# Replacement by multiple imputation (m = 5, one imputation every 150 iterations)
X_milrDA <- lrDA(X,label=0,dl=rep(1,6),ini.cov="multRepl",m=5,n.iters=150)

# Non-closed compositional data set
data(LPdata) # data (ppm/micrograms per gram)
dl <- c(2,1,0,0,2,0,6,1,0.6,1,1,0,0,632,10) # limits of detection (0 for no limit)
LPdata2 <- subset(LPdata,select=-c(Cu,Ni,La))  # select a subset for illustration purposes
dl2 <- dl[-c(5,7,10)]

# May take a little while
LPdata_lrDA <- lrDA(LPdata2,label=0,dl=dl2)

Run the code above in your browser using DataLab