multinomial: Multinomial Logit Model

Description

Fits a multinomial logit model (MLM) to a (preferably unordered) factor response.

Usage

multinomial(zero = NULL, parallel = FALSE, nointercept = NULL,
     refLevel = "(Last)", ynames = FALSE,
     imethod = 1, imu = NULL, byrow.arg = FALSE,
     sumcon = FALSE, Thresh = NULL, Trev = FALSE,
     Tref = if (Trev) "M" else 1, Intercept = NULL,
     whitespace = FALSE)

Value

An object of class "vglmff"

(see vglmff-class). The object is used by modelling functions such as vglm,

rrvglm and vgam.

Arguments

zero

Can be an integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. Any values must be from the set {1,2,...,$M$}. The default value means none are modelled as intercept-only terms. See CommonVGAMffArguments for more information.

parallel

A logical, or formula specifying which terms have equal/unequal coefficients.

ynames

Logical. If TRUE then "mu[,1]" is replaced by the probability of the first named response category, etc. (e.g., "P[normal]"), so that the output is more readable, albeit less compact. This is seen in output such as predict(fit) and coef(fit, matrix = TRUE). Of course, "mu" stands for the fitted probabilities, and it remains the default for upward compatibility and predictability.

nointercept, whitespace

See CommonVGAMffArguments for details.

imu, byrow.arg

See CommonVGAMffArguments for details.

sumcon

Logical. Use summation constraints? If TRUE then the baseline linear predictor is negative the sum of the other linear predictors. That is, instead of the baseline eta being 0, it is minus the sum of the others. This parameterization is symmetric and is more suitable for techniques such as the LASSO where regression coefficients are shrunk towards zero. If TRUE then the fitted values and log-likelihood should be unchanged, however coef and predict, etc. will return different answers, of course. Link multilogitlink has been modified to allow for sumcon.

refLevel

Either a (1) single positive integer or (2) a value of the factor or (3) a character string. If inputted as an integer then it specifies which column of the response matrix is the reference or baseline level. The default is the last one (the $(M+1)$th one). If used, this argument will be usually assigned the value 1. If inputted as a value of a factor then beware of missing values of certain levels of the factor (drop.unused.levels = TRUE or drop.unused.levels = FALSE). See the example below. If inputted as a character string then this should be equal to (A) one of the levels of the factor response, else (B) one of the column names of the matrix response of counts; e.g., vglm(cbind(normal, mild, severe) ~ let, multinomial(refLevel = "severe"), data = pneumo) if it was (incorrectly because the response is ordinal) applied to the pneumo data set. Another example is vglm(ethnicity ~ age, multinomial(refLevel = "European"), data = xs.nz) if it was applied to xs.nz.

imethod

Choosing 2 will use the mean sample proportions of each column of the response matrix, which corresponds to the MLEs for intercept-only models. See CommonVGAMffArguments for more details.

Thresh, Trev, Tref, Intercept: Same as cumulative. Because these arguments concern the intercepts, they should not be confused with the stereotype model where they would be applied to the A matrix instead.

Author

Thomas W. Yee

Warning

No check is made to verify that the response is nominal.

Argument sumcon is a recent addition and may have limited functionality in other parts of VGAM. It has not been tested extensively.

See CommonVGAMffArguments for more warnings.

Details

In this help file the response $Y$ is assumed to be a factor with unordered values $1,2,\dots,M+1$, so that $M$ is the number of linear/additive predictors $\eta_j$.

The default model can be written $$\eta_j = \log(P[Y=j]/ P[Y=M+1])$$ where $\eta_j$ is the $j$th linear/additive predictor. Here, $j=1,\ldots,M$, and $\eta_{M+1}$ is 0 by definition. That is, the last level of the factor, or last column of the response matrix, is taken as the reference level or baseline---this is for identifiability of the parameters. The reference or baseline level can be changed with the refLevel argument.

In almost all the literature, the constraint matrices associated with this family of models are known. For example, setting parallel = TRUE will make all constraint matrices (including the intercept) equal to a vector of $M$ 1's; to suppress the intercepts from being parallel then set parallel = FALSE ~ 1. If the constraint matrices are unknown and to be estimated, then this can be achieved by fitting the model as a reduced-rank vector generalized linear model (RR-VGLM; see rrvglm). In particular, a multinomial logit model with unknown constraint matrices is known as a stereotype model (Anderson, 1984), and can be fitted with rrvglm.

The above details correspond to the ordinary MLM where all the levels are altered (in the terminology of GAITD regression).

References

Agresti, A. (2013). Categorical Data Analysis, 3rd ed. Hoboken, NJ, USA: Wiley.

Anderson, J. A. (1984). Regression and ordered categorical variables. Journal of the Royal Statistical Society, Series B, Methodological, 46, 1--30.

Hastie, T. J., Tibshirani, R. J. and Friedman, J. H. (2009). The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed. New York, USA: Springer-Verlag.

McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. London: Chapman & Hall.

Tutz, G. (2012). Regression for Categorical Data, Cambridge: Cambridge University Press.

Yee, T. W. and Hastie, T. J. (2003). Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15--41.

Yee, T. W. (2010). The VGAM package for categorical data analysis. Journal of Statistical Software, 32, 1--34. tools:::Rd_expr_doi("10.18637/jss.v032.i10").

Yee, T. W. and Ma, C. (2024). Generally altered, inflated, truncated and deflated regression. Statistical Science, 39, 568--588.

Examples

Run this code

# Example 1: Regn spline VGAM: marital status versus age
data(marital.nz)
ooo <- with(marital.nz, order(age))
om.nz <- marital.nz[ooo, ]
fit1 <- vglm(mstatus ~ sm.bs(age), multinomial, om.nz)
coef(fit1, matrix = TRUE)  # Mostly meaningless
if (FALSE)  with(om.nz,
matplot(age, fitted(fit1), type = "l", las = 1, lwd = 2))
legend("topright", leg = colnames(fitted(fit1)),
       lty = 1:4, col = 1:4, lwd = 2) 

# Example 2a: a simple example
set.seed(123)
ycounts <- t(rmultinom(10, size = 20, prob = c(0.1, 0.2, 0.8)))
fit2 <- fit <- vglm(ycounts ~ 1, multinomial)
head(fitted(fit2))   # Proportions
fit2@prior.weights   # NOT recommended for the prior weights
weights(fit2, type = "prior", matrix = FALSE)  # Better
depvar(fit2)         # Sample proportions; same as fit2@y
constraints(fit2)    # Constraint matrices

# Example 2b: Different reference level used as the baseline
Fit2 <- vglm(ycounts ~ 1, multinomial(refLevel = 2))
coef(Fit2, matrix = TRUE)
coef(fit2, matrix = TRUE)  # Easy to reconcile this output with fit2

# Example 3: The response is a factor.
nn <- 10
dframe3 <- data.frame(yfac = gl(3, nn, labels = c("Ctrl",
                                "Trt1", "Trt2")),
                      x2   = runif(3 * nn))
myrefLevel <- with(dframe3, yfac[12])
fit3a <- vglm(yfac ~ x2, multinomial(refLevel = myrefLevel), dframe3)
fit3b <- vglm(yfac ~ x2, multinomial(refLevel = 2), dframe3)
coef(fit3a, matrix = TRUE)  # "Trt1" is the reference level
coef(fit3b, matrix = TRUE)  # "Trt1" is the reference level
margeff(fit3b)

# Example 4: Fit a rank-1 stereotype model
fit4 <- rrvglm(Country ~ Width + Height + HP, multinomial, car.all)
coef(fit4)  # Contains the C matrix
constraints(fit4)$HP       # The A matrix
coef(fit4, matrix = TRUE)  # The B matrix
Coef(fit4)@C               # The C matrix
concoef(fit4)              # Better to get the C matrix this way
Coef(fit4)@A               # The A matrix
svd(coef(fit4, matrix = TRUE)[-1, ])$d  # Has rank 1; = C %*% t(A)
# Classification (but watch out for NAs in some of the variables):
apply(fitted(fit4), 1, which.max)  # Classification
# Classification:
colnames(fitted(fit4))[apply(fitted(fit4), 1, which.max)]
apply(predict(fit4, car.all, type = "response"),
      1, which.max)  # Ditto

# Example 5: Using the xij argument (aka conditional logit model)
if (FALSE)  set.seed(111)
nn <- 100  # Number of people who travel to work
M <- 3  # There are M+1 models of transport to go to work
ycounts <- matrix(0, nn, M+1)
ycounts[cbind(1:nn, sample(x = M+1, size = nn, replace = TRUE))] = 1
dimnames(ycounts) <- list(NULL, c("bus","train","car","walk"))
gotowork <- data.frame(cost.bus  = runif(nn), time.bus  = runif(nn),
                       cost.train= runif(nn), time.train= runif(nn),
                       cost.car  = runif(nn), time.car  = runif(nn),
                       cost.walk = runif(nn), time.walk = runif(nn))
gotowork <- round(gotowork, digits = 2)  # For convenience
gotowork <- transform(gotowork,
              Cost.bus   = cost.bus   - cost.walk,
              Cost.car   = cost.car   - cost.walk,
              Cost.train = cost.train - cost.walk,
              Cost       = cost.train - cost.walk,  # for labelling
              Time.bus   = time.bus   - time.walk,
              Time.car   = time.car   - time.walk,
              Time.train = time.train - time.walk,
              Time       = time.train - time.walk)  # for labelling
fit5 <- vglm(ycounts ~ Cost + Time,
            multinomial(parall = TRUE ~ Cost + Time - 1),
            xij = list(Cost ~ Cost.bus + Cost.train + Cost.car,
                       Time ~ Time.bus + Time.train + Time.car),
            form2 =  ~ Cost + Cost.bus + Cost.train + Cost.car +
                       Time + Time.bus + Time.train + Time.car,
            data = gotowork, trace = TRUE)
head(model.matrix(fit5, type = "lm"))   # LM model matrix
head(model.matrix(fit5, type = "vlm"))  # Big VLM model matrix
coef(fit5)
coef(fit5, matrix = TRUE)
constraints(fit5)
summary(fit5)
max(abs(predict(fit5) - predict(fit5, new = gotowork)))  # 0?

Run the code above in your browser using DataLab