This function implements methods for imputing zeros in compositional count data sets based on a Bayesian-multiplicative replacement.
cmultRepl(X, label = 0,
method = c("GBM","SQ","BL","CZM","user"), output = c("prop","p-counts"),
delta = 0.65, threshold = 0.5, correct = TRUE, t = NULL, s = NULL,
suppress.print = FALSE)
Count data set (matrix
or data.frame
class).
Geometric Bayesian multiplicative (GBM
, default); square root BM (SQ
); Bayes-Laplace BM (BL
); count zero multiplicative (CZM
); user-specified hyper-parameters (user
).
Output format: imputed proportions (prop
, default) or pseudo-counts (p-counts
).
If method="CZM"
, fraction of the upper threshold used to impute zeros (default delta=0.65
). Also, fraction of the lowest estimated probability used to correct imputed proportions falling above it (when correct=TRUE
).
For a vector of counts, factor applied to the quotient 1 over the number of trials (sum of the counts) used to produce an upper limit for replacing zero counts by the CZM
method (default threshold=0.5
).
Logical vector setting whether imputed proportions falling above the lowest estimated probability for a multinomial part must be corrected or not (default correct=TRUE
).
If method="user"
, user-specified t hyper-parameter of the Dirichlet prior distribution for each count vector (row) in X
. It must be a matrix of the same dimensions as X
.
If method="user"
, user-specified s hyper-parameter of the Dirichlet prior distribution for each count vector (row) in X
. It must be a vector of length equal to the number of rows of X
.
Suppress printed feedback (suppress.print=FALSE
, default).
By default (output="prop"
) the function returns an imputed data set (data.frame
class) in proportions (estimated probabilities). Alternatively, these proportions are re-scaled to produce a compositionally-equivalent matrix of pseudo-counts (output="p-counts"
) which preserves the ratios between parts.
When correct=TRUE
and verbose=TRUE
, the number of times, if any, an imputed proportion was corrected to fall below the minimum estimated multinomial probability is printed.
Zero counts, assumed to be due to under-reporting or limited sampling, are imputed under a Bayesian paradigm (GBM
, SQ
or BL
method) by posterior estimates of the multinomial probabilities generating the counts, assuming a Dirichlet prior distribution. The argument method
sets the Dirichlet hyper-parameters t
(priori estimates of multinomial probabilities) and s
(strength). The user can specify their own by setting method="user"
and entering them as t
and s
arguments. Note that, under certain circumstances (see references for details), these methods can generate imputed proportions falling above the lowest estimated probability of a multinomial part (c/n, where c is the count and n is the number of trials). In such cases, the imputation is corrected by using a fraction (delta
) of the minimum c/n for that part. Lastly, the non-zero parts are multiplicatively adjusted according to their compositional nature.
On the other hand, method="CZM"
uses multiplicative simple replacement (multRepl
) on the matrix of estimated probabilities. The upper limit and the fraction delta used are specified by, respectively, the arguments threshold
and delta
. Suggested values are threshold=0.5
(so the upper limit for a multinomial probability turns out to be 0.5/n), and delta=0.65
(so the imputed proportion is 65% of the upper limit).
Martin-Fernandez, J.A., Hron, K., Templ, M., Filzmoser, P., Palarea-Albaladejo, J. Bayesian-multiplicative treatment of count zeros in compositional data sets. Statistical Modelling 2015; 15 (2): 134-158.
Palarea-Albaladejo J. and Martin-Fernandez JA. zCompositions -- R package for multivariate imputation of left-censored data under a compositional approach. Chemometrics and Intelligence Laboratory Systems 2015; 143: 85-96.
# NOT RUN {
data(Pigs)
# GBM method and matrix of estimated probabilities
Pigs.GBM <- cmultRepl(Pigs)
# }
Run the code above in your browser using DataLab