Time and memory efficient estimation of multinomial logit models using maximum likelihood method. Targeted at large scale multiclass classification problems in econometrics and machine learning. Numerical optimization is performed by the Newton-Raphson method using an optimized, parallel C++ library to achieve fast computation of Hessian matrices. The user interface closely related to the CRAN package mlogit.
mnlogit(formula, data, choiceVar=NULL, maxiter = 50, ftol = 1e-6,
gtol = 1e-6, weights = NULL, ncores = 1, na.rm = TRUE,
print.level=0, linDepTol = 1e-6, start=NULL, alt.subset=NULL, ...)
# S3 method for mnlogit
fitted(object, outcome=TRUE, ...)
# S3 method for mnlogit
residuals(object, outcome=TRUE, ...)
# S3 method for mnlogit
df.residual(object, ...)
# S3 method for mnlogit
terms(x, ...)
# S3 method for mnlogit
update(object, new, ...)
# S3 method for mnlogit
print(x, digits = max(3, getOption("digits") - 2),
width = getOption("width"),
what = c("obj", "eststat", "modsize"), ...)
# S3 method for mnlogit
vcov(object, ...)
# S3 method for mnlogit
logLik(object, ...)
# S3 method for mnlogit
summary(object, ...)
# S3 method for mnlogit
print.summary(x, digits = max(3, getOption("digits") - 2),
width = getOption("width"), ... )
# S3 method for mnlogit
index(object, ...)
# S3 method for mnlogit
predict(object, newdata = NULL, probability = TRUE,
returnData=FALSE, choiceVar=NULL, ...)
# S3 method for mnlogit
coef(object, order=FALSE, as.list = FALSE, ...)
formula
object or string specifying the model to be estimated (see Note).
A data.frame
object with data organized in the 'long' format (see Note).This can also be a mlogit.data
class object. newdata
is used in the predict
method.
A string naming the column in 'data' which has the list of choices. Note: This argument is not used if data
or newdata
is a mlogit.data
object.
An integer indicating maximum number of Newton's iterations. If maxiter <= 0
, then only Hessian, gradient and the loglikelihood are calculated at initial point.
A real number indicating tolerance on the difference of two subsequent loglikelihood values.
A real number indicating tolerance on norm of the gradient.
Optional vector of (positive) frequency weights, one for each observation.
An integer indicating number of processors allowed for Hessian calculations.
a logical variable which indicates whether rows of the data frame containing NAs will be removed.
An integer which controls the amount of information to be printed during execution.
Tolerance for detecting linear dependence between columns in input data. Dependent columns are removed from the estimation.
Named vector of coefficients to use as initial guess. Use naming convention as given by names(coeffit())
, where fit
is a mnlogit
class object.
Subset of alternatives to perform estimation on.
Currently unused.
An object of class mnlogit
.
a boolean which indicates, for the fitted
and the residuals
methods whether a matrix (for each choice, one value for each alternative) or a vector (for each choice, only a value for the alternative chosen) should be returned.
An formula
for the update
method. It must obey all rules specified for the formula
argument.
Number of digits to print.
The width of printing.
Specifies what to print. Default option is 'obj' is the print function for mnlogit objects. Option 'eststat' prints etimation stats and option 'mdsize' prints model size information.
If TRUE predict output the probability matrix, otherwise the chocice with the highest probability for each observation is returned.
If TRUE
a data attribute is added to the returned object.
If TRUE
coefficients are ordered by variable name.
Returns estimated model coefficients grouped by variable type.
An object of class mnlogit
, with elements:
the named vector of coefficients.
the value of the log-likelihood function at exit.
the gradient of the log-likelihood function at exit.
the Hessian of the log-likelihood function at exit.
Newton Raphson stats.
Estimated probabilities of the alternative selected in each observation.
the probability matrix: (i,j)
entry denotes the probability of the jth
alternative being chosen in the ith
observation.
The residual. Has attribute outcome
which is the probability of not choosing the selected alternative.
The number of estimated coefficients in the model.
The AIC value of the fitted model.
The vector of alternatives's names.
Information about number of parameters in model.
Vector of coefficients ordered by variable name.
The data.frame
used in model estimation.
The relative frequency of each choice in input data.
The formula
specifying the model.
The mnlogit
function call that user made,
Asad Hasan, Zhiyu Wang, Alireza S. Mahani (2016).Fast Estimation of Multinomial Logit Models: R Package mnlogit. Journal of Statistical Software, 75(3), 1-24. doi:10.18637/jss.v075.i03
Croissant, Yves. Estimation of multinomial logit models in R: The mlogit Packages. https://cran.r-project.org/package=mlogit
Train, K. (2004). Discrete Choice Methods with Simulation, Cambridge University Press.
# NOT RUN {
library(mnlogit)
data(Fish, package = "mnlogit")
fm <- formula(mode ~ price | income | catch)
fit <- mnlogit(fm, Fish, ncores = 2)
# }
# NOT RUN {
fit <- mnlogit(fm, Fish, choiceVar="alt", ncores = 2) # same effect as previous
summary(fit)
print(fit)
predict(fit)
print(fit, what = "eststat")
print(fit, what = "modsize")
# Formula examples (see also Note)
fm <- formula(mode ~ 1 | income) # Only type-2 with intercept
fm <- formula(mode ~ price - 1) # Only type-1, no intercept
fm <- formula(mode ~ 1 | 1 | catch) # Only type-3, including intercept
# }
Run the code above in your browser using DataLab