mnlogit: Fast estimation of multinomial logit models

Description

Time and memory efficient estimation of multinomial logit models using maximum likelihood method. Targeted at large scale multiclass classification problems in econometrics and machine learning. Numerical optimization is performed by the Newton-Raphson method using an optimized, parallel C++ library to achieve fast computation of Hessian matrices. The user interface closely related to the CRAN package mlogit.

Usage

mnlogit(formula, data, choiceVar=NULL, maxiter = 50, ftol = 1e-6,
        gtol = 1e-6, weights = NULL, ncores = 1, na.rm = TRUE, 
        print.level=0, linDepTol = 1e-6, start=NULL, alt.subset=NULL, ...)
# S3 method for mnlogit
fitted(object, outcome=TRUE, ...)
# S3 method for mnlogit
residuals(object, outcome=TRUE, ...)
# S3 method for mnlogit
df.residual(object, ...)
# S3 method for mnlogit
terms(x, ...)
# S3 method for mnlogit
update(object, new, ...)
# S3 method for mnlogit
print(x, digits = max(3, getOption("digits") - 2),
                        width = getOption("width"), 
                        what = c("obj", "eststat", "modsize"), ...)
# S3 method for mnlogit
vcov(object, ...)
# S3 method for mnlogit
logLik(object, ...)
# S3 method for mnlogit
summary(object, ...)
# S3 method for mnlogit
print.summary(x, digits = max(3, getOption("digits") - 2),
                       width = getOption("width"), ... )
# S3 method for mnlogit
index(object, ...)
# S3 method for mnlogit
predict(object, newdata = NULL, probability = TRUE, 
                          returnData=FALSE, choiceVar=NULL, ...)
# S3 method for mnlogit
coef(object, order=FALSE, as.list = FALSE, ...)

Arguments

formula

formula object or string specifying the model to be estimated (see Note).

data, newdata

A data.frame object with data organized in the 'long' format (see Note).This can also be a mlogit.data class object. newdata is used in the predict method.

choiceVar

A string naming the column in 'data' which has the list of choices. Note: This argument is not used if data or newdata is a mlogit.data object.

maxiter

An integer indicating maximum number of Newton's iterations. If maxiter <= 0, then only Hessian, gradient and the loglikelihood are calculated at initial point.

ftol

A real number indicating tolerance on the difference of two subsequent loglikelihood values.

gtol

A real number indicating tolerance on norm of the gradient.

weights

Optional vector of (positive) frequency weights, one for each observation.

ncores

An integer indicating number of processors allowed for Hessian calculations.

na.rm

a logical variable which indicates whether rows of the data frame containing NAs will be removed.

print.level

An integer which controls the amount of information to be printed during execution.

linDepTol

Tolerance for detecting linear dependence between columns in input data. Dependent columns are removed from the estimation.

start

Named vector of coefficients to use as initial guess. Use naming convention as given by names(coeffit()), where fit is a mnlogit class object.

alt.subset

Subset of alternatives to perform estimation on.

...

Currently unused.

object, x

An object of class mnlogit.

outcome

a boolean which indicates, for the fitted and the residuals methods whether a matrix (for each choice, one value for each alternative) or a vector (for each choice, only a value for the alternative chosen) should be returned.

new

An formula for the update method. It must obey all rules specified for the formula argument.

digits

Number of digits to print.

width

The width of printing.

what

Specifies what to print. Default option is 'obj' is the print function for mnlogit objects. Option 'eststat' prints etimation stats and option 'mdsize' prints model size information.

probability

If TRUE predict output the probability matrix, otherwise the chocice with the highest probability for each observation is returned.

returnData

If TRUE a data attribute is added to the returned object.

order

If TRUE coefficients are ordered by variable name.

as.list

Returns estimated model coefficients grouped by variable type.

Value

An object of class mnlogit, with elements:

coefficients

the named vector of coefficients.

logLik

the value of the log-likelihood function at exit.

gradient

the gradient of the log-likelihood function at exit.

hessian

the Hessian of the log-likelihood function at exit.

est.stat

Newton Raphson stats.

fitted.values

Estimated probabilities of the alternative selected in each observation.

probabilities

the probability matrix: (i,j) entry denotes the probability of the jth alternative being chosen in the ith observation.

residuals

The residual. Has attribute outcome which is the probability of not choosing the selected alternative.

The number of estimated coefficients in the model.

AIC

The AIC value of the fitted model.

choices

The vector of alternatives's names.

model.size

Information about number of parameters in model.

ordered.coeff

Vector of coefficients ordered by variable name.

model

The data.frame used in model estimation.

freq

The relative frequency of each choice in input data.

formula

The formula specifying the model.

call

The mnlogit function call that user made,

References

Asad Hasan, Zhiyu Wang, Alireza S. Mahani (2016).Fast Estimation of Multinomial Logit Models: R Package mnlogit. Journal of Statistical Software, 75(3), 1-24. doi:10.18637/jss.v075.i03

Croissant, Yves. Estimation of multinomial logit models in R: The mlogit Packages. https://cran.r-project.org/package=mlogit

Train, K. (2004). Discrete Choice Methods with Simulation, Cambridge University Press.

Examples

Run this code

# NOT RUN {
  library(mnlogit)
  data(Fish, package = "mnlogit")
  fm <- formula(mode ~ price | income | catch)

  fit <- mnlogit(fm, Fish, ncores = 2)

 
# }
# NOT RUN {
   fit <- mnlogit(fm, Fish, choiceVar="alt", ncores = 2) # same effect as previous
   summary(fit)
   print(fit)
   predict(fit)
   print(fit, what = "eststat")
   print(fit, what = "modsize")
  
   # Formula examples (see also Note)
   fm <- formula(mode ~ 1 | income)    # Only type-2 with intercept
   fm <- formula(mode ~ price - 1)     # Only type-1, no intercept
   fm <- formula(mode ~ 1 | 1 | catch) # Only type-3, including intercept 
  
# }

Run the code above in your browser using DataLab