Finds the posterior mode for multinomial logistic regression parameters using cyclic coordinate descent.
This is designed to be useful for inverse regression analysis of sentiment in
text, where the multinomial response is quite large. It should be generally useful
for any large-scale multinomial logistic regression, but is optimized for the large-response setting
(e.g., counts
are treated as sparse while covars
are dense). The model is identified by fixing coefficients at zero for a
specified null category. With binomial response, the first category
is assumed null. For multinomial response dimension greater than
two, each response vector is augmented with a null category count of
zero, such that the linear model equations can then be interpreted as
log odds of each response category against a very rare null event
with covariate independent probability. This specification is
designed to work well for high-dimension response (e.g. text), our
motivating application, but should work in a variety of
settings. Fitted probabilities and those obtained using
predict.mnlm
are corrected to condition on response not coming
from this null category.
Coefficient penalization is based upon the precision parameters
$\lambda$ of independent Laplace priors on each non-intercept
regression coefficient. Here, the Laplace density is $p(z) = (\lambda/2)exp[-\lambda|z|]$,
with variance $2/\lambda$. Via the penalty
argument, this precision is either fixed,
which corresponds to the L1 penalty $\lambda|z|$, or it is assigned a $Gamma(s, r)$
prior and estimated jointly with the coefficient, which corresponds to the non-convex penalty $s*log[1 + |z|/r]$.
In the case of joint penalty-coefficient estimation, prior variance $s/r^2 = E\lambda/r$ controls the degree of penalty curvature.
In the case that the variance is large relative to the amount of information in the likelihood, the posterior can become multimodal.
Since this leads to unstable optimization and less meaningful MAP estimates,
mnlm
will warn and automatically double $r$ and $s$ until obtaining a concave posterior.
If the resulting prior precision is higher than you would like,
it may be worth the computational effort to integrate over penalty uncertainty in mean,
rather than MAP, estimation; the reglogit
package is available for such inference in binomial regression settings.
Additional details are available in Taddy (2011).