mnlm: Estimation for high-dimensional Multinomial Logistic Regression

Description

MAP estimation of multinomial logistic regression models.

Usage

mnlm(counts, covars, normalize=TRUE, penalty=c(shape=1,rate=1/2), 
                     start=NULL, tol=1e-2, bins=0, verb=FALSE, ...)

Arguments

counts

A matrix of multinomial response counts in ncol(counts) categories for nrow(counts) individuals/observations. This can be a matrix, or a vector of response factors, but for most text-analysis applications

covars

A matrix or simple_triplet_matrix of ncol(covars) covariate values for each of the nrow(counts) observations. This does not include the intercept, which is ALWAYS added in the design matrix.

normalize

Whether or not to normalize the covariates. Default is TRUE. If covars is a matrix, this will scale the inputs to have mean zero and standard deviation of one. If covars is a simple_triplet

penalty

This input argument is either a single fixed value of the L1 (lasso) penalty $\lambda>0$, or a vector of length 2 containing $[s, r]$ -- shape '$s$' and rate '$r$' parameters for the Gamma prior on $\lambda$, such that $E\lambda =s/r$. Refer to th

start

An initial guess for the full ncol(counts) by ncol(covars)+1 matrix of regression coefficients. Under the default start=NULL, the intercept is a logit transform of mean phrase frequencies and coef

tol

Optimization convergence tolerance for the improvement on the un-normalized negative log posterior over a single full parameter sweep.

bins

For faster inference on large data sets (or just to collapse observations across levels for factor covariates), you can specify the number of bins for step-function approximations to the columns of covars. Counts a

verb

Control for print-statement output. TRUE prints some initial info and updates every iteration.

...

Additional undocumented arguments to internal functions.

Value

An mnlm object list with entries
interceptThe intercept estimates for each phrase ($\alpha$).
loadingsThe intercept estimates for each phrase ($\Phi$).
countssimple_triplet_matrix form of the counts input matrix
XIf bins>0, the binned counts matrix used for analysis.
covarsThe input covariates, possibly normalized.
VIf bins>0, the binned (and possibly normalized) covariate simple_triplet_matrix used for analysis.
penaltyThe penalty specification upon convergence.
normalizedThe input normalize indicator.
binnedAn indicator for whether the observations was binned.
covarMeanIf normalize=TRUE, the amount covariates were shifted (original means for matrix covars, 0 for sparse stm covars). Otherwise empty.
covarSDIf normalize=TRUE, the original covariate standard deviations. Otherwise empty.
priorThe penalty prior (gamma distribution parameters, or fixed values).
lambdaPosterior MAPs (or just the fixed input values) for each coefficient's penalty.
fittedFitted count expectations. With binary response, this is a vector of fitted probabilities. For binomial or multinomial response, it is a simple triplet matrix with empty entries for zero count observations.

Details

Finds the posterior mode for multinomial logistic regression parameters using cyclic coordinate descent. This is designed to be useful for inverse regression analysis of sentiment in text, where the multinomial response is quite large. It should be generally useful for any large-scale multinomial logistic regression, but is optimized for the large-response setting (e.g., counts are treated as sparse while covars are dense).

The model is identified by fixing coefficients at zero for a specified null category. With binomial response, the first category is assumed null. For multinomial response dimension greater than two, each response vector is augmented with a null category count of zero, such that the linear model equations can then be interpreted as log odds of each response category against a very rare null event with covariate independent probability. This specification is designed to work well for high-dimension response (e.g. text), our motivating application, but should work in a variety of settings. Fitted probabilities and those obtained using predict.mnlm are corrected to condition on response not coming from this null category.

Coefficient penalization is based upon the precision parameters $\lambda$ of independent Laplace priors on each non-intercept regression coefficient. Here, the Laplace density is $p(z) = (\lambda/2)exp[-\lambda|z|]$, with variance $2/\lambda$. Via the penalty argument, this precision is either fixed, which corresponds to the L1 penalty $\lambda|z|$, or it is assigned a $Gamma(s, r)$ prior and estimated jointly with the coefficient, which corresponds to the non-convex penalty $s*log[1 + |z|/r]$.

In the case of joint penalty-coefficient estimation, prior variance $s/r^2 = E\lambda/r$ controls the degree of penalty curvature. In the case that the variance is large relative to the amount of information in the likelihood, the posterior can become multimodal. Since this leads to unstable optimization and less meaningful MAP estimates, mnlm will warn and automatically double $r$ and $s$ until obtaining a concave posterior. If the resulting prior precision is higher than you would like, it may be worth the computational effort to integrate over penalty uncertainty in mean, rather than MAP, estimation; the reglogit package is available for such inference in binomial regression settings.

Additional details are available in Taddy (2011).

References

Taddy (2011), Inverse Regression for Analysis of Sentiment in Text. http://arxiv.org/abs/1012.2098

Examples

Run this code

### See congress109 and we8there for more real data examples

### Bernoulli simulation; re-run to see sampling variability ###
n <- 100
v <- rnorm(n)
p <- (1+exp(-(v*2)))^{-1} 
y <- rbinom(n, size=1, prob=p)

## fit the logistic model
summary( fit <- mnlm(y, v, verb=TRUE) )
par(mfrow=c(1,2))
plot(fit)

## use predict to see fitted probabilities (could also just use fit$fitted)
phat <-  predict(fit, newdata=matrix(v,ncol=1))
plot(p, phat, pch=21, bg=c(2,4)[y+1], xlab="true probability", ylab="fitted probability")

### Ripley's Cushing Data ###

## see help(Cushings) for data
library(MASS)
data(Cushings)
train <- Cushings[Cushings$Type != "u",]
newdata <- as.matrix(Cushings[Cushings$Type == "u", 1:2])

## fit, summarize, predict, and plot
fit <- mnlm(counts=factor(train$Type), covars=train[,1:2])
summary(fit)
predict(fit, newdata)
par(mfrow=c(1,1))
plot(fit)

Run the code above in your browser using DataLab