Learn R Programming

zCompositions (version 1.0.0)

cmultRepl: Bayesian-Multiplicative replacement for count zeros

Description

This function implements methods for imputing zeros in compositional count data sets based on a Bayesian-multiplicative replacement.

Usage

cmultRepl(X, method = c("GBM","SQ","BL","CZM","user"), output = c("prop","counts"),
          delta = 0.65, threshold = 0.5, correct = TRUE, t = NULL, s = NULL)

Arguments

X
Count data set (matrix or data.frame class).
method
Geometric Bayesian multiplicative, BM, (GBM, default); square root BM (SQ); Bayes-Laplace BM (BL); count zero multiplicative (CZM); user-specified hyper-parameters (user).
output
Output format: imputed proportions (prop, default) or pseudo-counts (counts).
delta
If method="CZM", fraction of the upper threshold used to impute zeros (default delta=0.65). Also, fraction of the lowest estimated probability used to correct imputed proportions falling above it (when correct=TRUE).
threshold
For a vector of counts, fraction of the number of trials (sum of the counts) used as upper threshold for replacing zero counts by the CZM method (default threshold=0.5).
correct
Logical vector setting whether imputed proportions falling above the lowest estimated probability for a multinomial part must be corrected or not (default correct=TRUE).
t
If method="user", user-specified t hyper-parameter of the Dirichlet prior distribution for each count vector (row) in X. It must be a matrix of the same dimensions as X.
s
If method="user", user-specified s hyper-parameter of the Dirichlet prior distribution for each count vector (row) in X. It must be a vector of length equal to the number of rows of X.

Value

  • By default (output="prop") the function returns a replaced data set (data.frame class) in proportions (estimated probabilities). Alternatively (output="counts"), these proportions can be re-scaled to produce a compositionally-equivalent matrix of pseudo-counts (output="counts").

    When correct=TRUE, the number of times, if any, an imputed proportion was corrected to fall below the minimum estimated multinomial probability is printed.

Details

Zero counts, assumed to be a consequence of the sampling process, are replaced under a Bayesian paradigm (GBM, SQ or BL method) by posterior estimates of the multinomial probabilities generating the counts, assuming a Dirichlet prior distribution. The argument method sets the Dirichlet hyper-parameters t (priori estimates of multinomial probabilities) and s (strength). The user can specify their own by setting method="user" and entering them as t and s arguments. Note that, under certain circumstances (see references for details), these methods can generate imputed proportions falling above the lowest estimated probability of a multinomial part (c/n, where c is the count and n is the number of trials). In such cases, the replacement is corrected by using a fraction (delta) of the minimum c/n for that part. Lastly, the non-zero parts are multiplicatively adjusted according to their compositional nature.

On the other hand, method="CZM" uses multiplicative simple replacement (multRepl) on the matrix of estimated probabilities. The upper limit and the fraction delta used are specified by, respectively, the arguments threshold and delta. Suggested values are threshold=0.5 (so the upper limit for a multinomial probability turns out to be 0.5/n), and delta=0.65 (so the imputed proportion is 65% of the upper limit).

References

Martin-Fernandez JA, Hron K, Templ M, Filzmoser P, Palarea-Albaladejo J. Bayesian-multiplicative treatment of count zeros in compositional data sets. Statistical Modelling 2014; to appear.

See Also

zPatterns, multRepl, multLN, lrEM, lrDA

Examples

Run this code
data(Pigs)

# GBM method and matrix of estimated probabilities
Pigs.GBM <- cmultRepl(Pigs)

Run the code above in your browser using DataLab