VGAM (version 1.1-6)

# dirmul.old: Fitting a Dirichlet-Multinomial Distribution

## Description

Fits a Dirichlet-multinomial distribution to a matrix of non-negative integers.

## Usage

dirmul.old(link = "loglink", ialpha = 0.01, parallel = FALSE, zero = NULL)

## Arguments

Link function applied to each of the $$M$$ (positive) shape parameters $$\alpha_j$$ for $$j=1,\ldots,M$$. See Links for more choices. Here, $$M$$ is the number of columns of the response matrix.

ialpha

Numeric vector. Initial values for the alpha vector. Must be positive. Recycled to length $$M$$.

parallel

A logical, or formula specifying which terms have equal/unequal coefficients.

zero

An integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. The values must be from the set {1,2,…,$$M$$}.

## Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, rrvglm and vgam.

## Details

The Dirichlet-multinomial distribution, which is somewhat similar to a Dirichlet distribution, has probability function $$P(Y_1=y_1,\ldots,Y_M=y_M) = {2y_{*} \choose {y_1,\ldots,y_M}} \frac{\Gamma(\alpha_{+})}{\Gamma(2y_{*}+\alpha_{+})} \prod_{j=1}^M \frac{\Gamma(y_j+\alpha_{j})}{\Gamma(\alpha_{j})}$$ for $$\alpha_j > 0$$, $$\alpha_+ = \alpha_1 + \cdots + \alpha_M$$, and $$2y_{*} = y_1 + \cdots + y_M$$. Here, $$a \choose b$$ means $$a$$ choose $$b$$'' and refers to combinations (see choose). The (posterior) mean is $$E(Y_j) = (y_j + \alpha_j) / (2y_{*} + \alpha_{+})$$ for $$j=1,\ldots,M$$, and these are returned as the fitted values as a $$M$$-column matrix.

## References

Lange, K. (2002). Mathematical and Statistical Methods for Genetic Analysis, 2nd ed. New York: Springer-Verlag.

Forbes, C., Evans, M., Hastings, N. and Peacock, B. (2011). Statistical Distributions, Hoboken, NJ, USA: John Wiley and Sons, Fourth edition.

Paul, S. R., Balasooriya, U. and Banerjee, T. (2005). Fisher information matrix of the Dirichlet-multinomial distribution. Biometrical Journal, 47, 230--236.

Tvedebrink, T. (2010). Overdispersion in allelic counts and $$\theta$$-correction in forensic genetics. Theoretical Population Biology, 78, 200--210.

dirmultinomial, dirichlet, betabinomialff, multinomial.

## Examples

Run this code
# NOT RUN {
# Data from p.50 of Lange (2002)
alleleCounts <- c(2, 84, 59, 41, 53, 131, 2, 0,
0, 50, 137, 78, 54, 51, 0, 0,
0, 80, 128, 26, 55, 95, 0, 0,
0, 16, 40, 8, 68, 14, 7, 1)
dim(alleleCounts) <- c(8, 4)
alleleCounts <- data.frame(t(alleleCounts))
dimnames(alleleCounts) <- list(c("White","Black","Chicano","Asian"),
paste("Allele", 5:12, sep = ""))

set.seed(123)  # @initialize uses random numbers
fit <- vglm(cbind(Allele5,Allele6,Allele7,Allele8,Allele9,
Allele10,Allele11,Allele12) ~ 1, dirmul.old,
trace = TRUE, crit = "c", data = alleleCounts)

(sfit <- summary(fit))
vcov(sfit)
round(eta2theta(coef(fit), fit@misc$link, fit@misc$earg), digits = 2)  # not preferred
round(Coef(fit), digits = 2)  # preferred
round(t(fitted(fit)), digits = 4)  # 2nd row of Table 3.5 of Lange (2002)
coef(fit, matrix = TRUE)

pfit <- vglm(cbind(Allele5,Allele6,Allele7,Allele8,Allele9,
Allele10,Allele11,Allele12) ~ 1,
dirmul.old(parallel = TRUE), trace = TRUE,
data = alleleCounts)
round(eta2theta(coef(pfit, matrix = TRUE), pfit@misc$link, pfit@misc$earg), digits = 2)  # 'Right' answer
round(Coef(pfit), digits = 2)  # 'Wrong' answer due to parallelism constraint
# }


Run the code above in your browser using DataLab