mnlm: Estimation for high-dimensional Multinomial Logistic Regression

Description

MAP estimation of multinomial logistic regression models.

Usage

mnlm(counts, covars, normalize=TRUE, penalty=c(shape=1,rate=1/2), 
                     start=NULL, tol=1e-2, bins=0, verb=FALSE, quasinewton=0, ...)
1

Arguments

counts

A matrix of multinomial response counts in ncol(counts) or nlevel(counts) categories for nrow(counts) observations. This can be a matrix, a vector of response factors, or a simple_triplet

covars

A matrix or simple_triplet_matrix of ncol(covars) covariate values for each of the nrow(counts) observations. This does not include the intercept, which is ALWAYS added in the design matrix.

normalize

Whether or not to normalize the covariates. Default is TRUE. If covars is a matrix, this will scale the inputs to have mean zero and standard deviation of one. If covars is a simple_triplet

penalty

This input argument is a vector of length 2 containing $[s, r]$ -- shape "$s$" and rate "$r$" -- parameters for the Gamma prior on L1 (lasso) penalty $\lambda$, such that $E\lambda = s/r$. Refer to the details section for additional informati

start

An optional initial guess for the full ncol(covars)+1 by ncol(counts) matrix of regression coefficients (including the intercept). Under the default start=NULL, the intercept is a logit transform of

tol

Optimization convergence tolerance for the improvement on the un-normalized negative log posterior over a single full parameter sweep.

bins

For faster inference on large data sets (or just to collapse observations across levels for factor covariates), you can specify the number of bins for step-function approximations to the columns of covars. C

verb

Control for print-statement output. TRUE prints some initial info and updates every iteration.

quasinewton

If greater than zero, we attempt quasi-Newton acceleration [see Lange, 2010] after the objective updates are less than quasinewton*tol. Be warned: this feature is new and experimental. It can significantly speed convergence, but also

...

Additional undocumented arguments to internal functions.

Value

An mnlm object list with entries
interceptThe intercept estimates for each phrase ($\alpha$).
loadingsA simple_triplet_matrix of estimates for coefficients ($\Phi$) on the scale fitted (possibly normalized) covariates.
countssimple_triplet_matrix form of the counts input matrix
XIf bins>0, the binned counts matrix used for analysis.
covarsThe input covariates, possibly normalized.
VIf bins>0, the binned (and possibly normalized) covariate simple_triplet_matrix used for analysis.
penaltyThe penalty specification upon convergence.
normalizedThe input normalize indicator.
binnedAn indicator for whether the observations was binned.
covarMeanIf normalize=TRUE, the amount covariates were shifted (original means for matrix covars, 0 for sparse stm covars). Otherwise empty.
covarSDIf normalize=TRUE, the original covariate standard deviations. Otherwise empty.
priorThe penalty prior (gamma hyperparameters, or fixed laplace scale, or normal precision).
fittedFitted count expectations. With binomial response, this is a vector of fitted probabilities. For multinomial response, it is a simple triplet matrix if of fitted probabilities ONLY for non-zero count observations (and with empty entries for zero count observations).

Details

Finds the posterior mode for multinomial logistic regression parameters using cyclic coordinate descent. This is designed to be useful for inverse regression analysis of sentiment in text, where the multinomial response is quite large, but should be useful for any large-scale logistic regression.

For binomial response, the first category is assumed null. For multinomial response, the model is identified by placing a Normal(0,1) prior on the intercepts (this can be changed via the list specification for penalty).

Coefficient penalization is based upon the precision parameters $\lambda$ of independent Laplace priors on each non-intercept regression coefficient. Here, the Laplace density is $p(z) = (\lambda/2)exp[-\lambda|z|]$, with variance $2/\lambda$. Via the penalty argument, this precision is either fixed, which corresponds to the L1 penalty $\lambda|z|$, or it is assigned a $Gamma(s, r)$ prior and estimated jointly with the coefficient, which corresponds to the `gamma-lasso' non-convex penalty $s*log[1 + |z|/r]$.

In the case of gamma-lasso estimation, prior variance $s/r^2 = E\lambda/r$ controls the degree of penalty curvature. In the case that the variance is large relative to the amount of information in the likelihood, the posterior can become multimodal. Since this leads to unstable optimization and less meaningful MAP estimates, mnlm will warn and automatically double $r$ and $s$ until obtaining a concave posterior. If the resulting prior precision is higher than you would like, it may be worth the computational effort to integrate over penalty uncertainty in mean, rather than MAP, estimation; the reglogit package is available for such inference in binomial regression settings.

Additional details are available in Taddy (2012).

References

Taddy (2012), Multinomial Inverse Regression for Text Analysis. http://arxiv.org/abs/1012.2098

Lange (2010), Numerical Analysis for Statisticians.

Examples

Run this code

### See congress109 and we8there for more real data examples

### Bernoulli simulation; re-run to see sampling variability ###
n <- 100
v <- rnorm(n)
p <- (1+exp(-(v*2)))^{-1} 
y <- rbinom(n, size=1, prob=p)

## fit the logistic model
summary( fit <- mnlm(y, v, verb=TRUE) )
par(mfrow=c(1,2))
plot(fit)

## use predict to see fitted probabilities (could also just use fit$fitted)
phat <-  predict(fit, newdata=matrix(v,ncol=1))
plot(p, phat, pch=21, bg=c(2,4)[y+1], xlab="true probability", ylab="fitted probability")

### Ripley's Cushing Data ###

## see help(Cushings) for data
library(MASS)
data(Cushings)
train <- Cushings[Cushings$Type != "u",]
newdata <- as.matrix(Cushings[Cushings$Type == "u", 1:2])

## fit, summarize, predict, and plot
fit <- mnlm(counts=factor(train$Type), covars=train[,1:2])
summary(fit)
round(coef(fit),2)
predict(fit, newdata)
par(mfrow=c(1,1))
plot(fit)

Run the code above in your browser using DataLab