ddirmn
computes the log of the Dirichlet multinomial probability mass function.
rdirmn
generates Dirichlet multinomially distributed random number vectors.
rdirmn(n, size, alpha)ddirmn(Y, alpha)
number of random vectors to generate. When size
is a scalar and alpha
is a vector,
must specify n
. When size
is a vector and alpha
is a matrix, n
is optional.
The default value of n
is the length of size
. If given, n
should be equal to
the length of size
.
a number or vector specifying the total number of objects that are put into d categories in the Dirichlet multinomial distribution.
the parameter of the Dirichlet multinomial distribution. Can be a numerical positive vector or matrix.
For ddirmn
, alpha
has to match the size of Y
. If alpha
is a vector, it will be replicated \(n\) times to match the dimension of Y
.
For rdirmn
, if alpha
is a vector, size
must be a scalar, and all the random vectors will
be drawn from the same alpha
and size
.
If alpha
is a matrix, the number of rows should match the length of
size
, and each random vector
will be drawn from the corresponding row of alpha
and the corresponding
element in the size
vector. See Details below.
The multivariate count matrix with dimensions \(n \times d\), where \(n = 1,2, \ldots\) is the number of observations and \(d=2,3, \ldots\) is the number of categories.
For each count vector and each corresponding parameter vector
\(\alpha\), the function ddirmn
returns the value \(\log(P(y|\alpha))\).
When Y
is a matrix of \(n\) rows, ddirmn
returns a vector of length \(n\).
rdirmn
returns a \(n\times d\) matrix of the generated random observations.
When the multivariate count data exhibits over-dispersion, the traditional multinomial model is insufficient. Dirichlet multinomial distribution models the probabilities of the categories by a Dirichlet distribution. Given the parameter vector \(\alpha = (\alpha_1, \ldots, \alpha_d), \alpha_j>0 \), the probability mass of \(d\)-category count vector \(Y=(y_1, \ldots, y_d)\), \(d \ge 2\) under Dirichlet multinomial distribution is $$ P(y|\alpha) = C_{y_1, \ldots, y_d}^{m} \prod_{j=1}^{d} \frac{\Gamma(\alpha_j+y_j)}{\Gamma(\alpha_j)} \frac{\Gamma(\sum_{j'=1}^d \alpha_{j'})}{\Gamma(\sum_{j'=1}^d \alpha_{j'} + \sum_{j'=1}^d y_{j'})}, $$ where \(m=\sum_{j=1}^d y_j\). Here, \(C_k^n\), often read as "\(n\) choose \(k\)", refers the number of \(k\) combinations from a set of \(n\) elements.
The parameter \(\alpha\) can be a vector of length \(d\), such as the results from the distribution fitting. \(\alpha\) can also be a matrix with \(n\) rows, such as the inverse link calculated from the regression parameter estimate \(exp(X\beta)\).
# NOT RUN {
m <- 20
alpha <- c(0.1, 0.2)
dm.Y <- rdirmn(n=10, m, alpha)
pdfln <- ddirmn(dm.Y, alpha)
# }
Run the code above in your browser using DataLab