An object that specifies the distribution to be fitted by the MGLMfit
function, or the regression model to be fitted by the MGLMreg
or MGLMsparsereg
functions.
Can be chosen from "MN"
, "DM"
, "NegMN"
, or "GDM"
.
A multinomial distribution models the counts of \(d\) possible outcomes. The counts of categories are negatively correlated. The density of a \(d\) category count vector \(y\) with parameter \(p=(p_1, \ldots, p_d)\) is $$ P(y|p) = C_{y_1, \ldots, y_d}^{m} \prod_{j=1}^{d} p_j^{y_j}, $$ where \(m = \sum_{j=1}^d y_j\), \(0 < p_j < 1\), and \(\sum_{j=1}^d p_j = 1\). Here, \(C_k^n\), often read as "\(n\) choose \(k\)", refers the number of \(k\) combinations from a set of \(n\) elements.
The MGLMreg
function with dist="MN"
calculates the MLE of regression coefficients \(\beta_j\) of the multinomial logit model, which has link function \(p_j = exp(X\beta_j)/(1 + \sum_{j=1}^{d-1} exp(X\beta_j))\), \(j=1,\ldots,d-1\). The MGLMsparsereg
function with dist="MN"
fits regularized multinomial logit model.
When the multivariate count data exhibits over-dispersion, the traditional multinomial model is insufficient. Dirichlet multinomial distribution models the probabilities of the categories by a Dirichlet distribution. The density of a \(d\) category count vector \(y\), with parameter \(\alpha = (\alpha_1, \ldots, \alpha_d)\), \(\alpha_j > 0\), is $$ P(y|\alpha) = C_{y_1, \ldots, y_d}^{m} \prod_{j=1}^{d} \frac{\Gamma(\alpha_j+y_j)}{\Gamma(\alpha_j)} \frac{\Gamma(\sum_{j'=1}^d \alpha_{j'})}{\Gamma(\sum_{j'=1}^d \alpha_{j'} + \sum_{j'=1}^d y_{j'})}, $$ where \(m=\sum_{j=1}^d y_j\). Here, \(C_k^n\), often read as "\(n\) choose \(k\)", refers the number of \(k\) combinations from a set of \(n\) elements.
The MGLMfit
function with dist="DM"
calculates the maximum likelihood estimate (MLE) of \((\alpha_1, \ldots, \alpha_d)\). The MGLMreg
function with dist="DM"
calculates the MLE of regression coefficients \(\beta_j\) of the Dirichlet multinomial regression model, which has link function \(\alpha_j = exp(X\beta_j)\), \(j=1,\ldots,d\). The MGLMsparsereg
function with dist="DM"
fits regularized Dirichlet multinomial regression model.
The more flexible Generalized Dirichlet multinomial model can be used when the counts of categories have both positive and negative correlations. The probability mass of a count vector \(y\) over \(m\) trials with parameter \((\alpha, \beta)=(\alpha_1, \ldots, \alpha_{d-1}, \beta_1, \ldots, \beta_{d-1})\), \(\alpha_j, \beta_j > 0\), is $$ P(y|\alpha,\beta) =C_{y_1, \ldots, y_d}^{m} \prod_{j=1}^{d-1} \frac{\Gamma(\alpha_j+y_j)}{\Gamma(\alpha_j)} \frac{\Gamma(\beta_j+z_{j+1})}{\Gamma(\beta_j)} \frac{\Gamma(\alpha_j+\beta_j)}{\Gamma(\alpha_j+\beta_j+z_j)} , $$ where \(z_j = \sum_{k=j}^d y_k\) and \(m=\sum_{j=1}^d y_j\). Here, \(C_k^n\), often read as "\(n\) choose \(k\)", #' refers the number of \(k\) combinations from a set of \(n\) elements.
The MGLMfit
with dist="GDM"
calculates the MLE of \((\alpha, \beta)=(\alpha_1, \ldots, \alpha_{d-1}, \beta_1, \ldots, \beta_{d-1})\). The MGLMreg
function with dist="GDM"
calculates the MLE of regression coefficients \(\alpha_j, \beta_j\) of the generalized Dirichlet multinomial regression model, which has link functions \(\alpha_j=exp(X\alpha_j)\) and \(\beta_j=exp(X\beta_j)\), \(j=1, \ldots, d-1\). The MGLMsparsereg
function with dist="GDM"
fits regularized generalized Dirichlet multinomial regression model.
Both the multinomial distribution and Dirichlet multinomial distribution are good for negatively correlated counts. When the counts of categories are positively correlated, the negative multinomial distribution is preferred. The probability mass function of a \(d\) category count vector \(y\) with parameter \((p_1, \ldots, p_{d+1}, \beta)\), \(\sum_{j=1}^{d+1} p_j=1\), \(p_j > 0\), \(\beta > 0\), is $$ P(y|p,\beta) = C_{m}^{\beta+m-1} C_{y_1, \ldots, y_d}^{m} \prod_{j=1}^d p_j^{y_j} p_{d+1}^\beta \\ = \frac{\beta_m}{m!} C_{y_1, \ldots, y_d}^{m} \prod_{j=1}^d p_j^{y_j} p_{d+1}^\beta, $$ where \(m = \sum_{j=1}^d y_j\). Here, \(C_k^n\), often read as "\(n\) choose \(k\)", refers the number of \(k\) combinations from a set of \(n\) elements.
The MGLMfit
function with dist="NegMN"
calculates the MLE of \((p_1, \ldots, p_{d+1}, \beta)\). The MGLMreg
function with dist="NegMN"
and regBeta=FALSE
calculates the MLE of regression coefficients \((\alpha_1,\ldots,\alpha_d, \beta)\) of the negative multinomial regression model, which has link function \(p_{d+1} = 1/(1 + \sum_{j=1}^d exp(X\alpha_j))\), \(p_j = exp(X\alpha_j) p_{d+1}\), \(j=1, \ldots, d\). When dist="NegMN"
and regBeta=TRUE
, the overdispersion parameter is linked to covariates via \(\beta=exp(X\alpha_{d+1})\), and the
function MGLMreg
outputs an estimated matrix of
\((\alpha_1, \ldots, \alpha_{d+1})\). The MGLMsparsereg
function with dist="NegMN"
fits regularized negative multinomial regression model.
MGLMfit
, MGLMreg
, MGLMsparsereg
,
dmn
, ddirmn
, dgdirmn
, dnegmn