multinomRob
fits the overdispersed multinomial regression model
for grouped count data using the hyperbolic tangent (tanh) and least quartile
difference (LQD) robust estimators.
multinomRob(model, data, starting.values=NULL, equality=NULL, genoud.parms=NULL, print.level=0, iter = FALSE, maxiter = 10, multinom.t=1, multinom.t.df=NA, MLEonly=FALSE)
model=list(y1 ~ x1, y2 ~ x2, y3 ~ 0)
the outcome variables containing counts are y1
, y2
and
y3
, and the linear predictor for y1
is a coefficient times
x1
plus a constant, the linear predictor for y2
is a
coefficient times x2
plus a constant, and the linear predictor for
y3
is zero. Each formula has the format countvar ~ RHS
,
where countvar
is the name of a vector, in the dataframe referenced
by the data
argument, that gives the counts for all observations
for one category. RHS
denotes the righthand side of a formula using
the usual syntax for formulas, where each variable in the formula is the
name of a vector in the dataframe referenced by the data
argument.
For example, a RHS
specification of var1 + var2*var3
would
specify that the regressors are to be var1
, var2
,
var3
, the terms generated by the interaction var2:var3
, and
the constant.
The set of outcome alternatives may be specified to vary over observations, by putting in a negative value for alternatives that do not exist for particular observations. If the value of an outcome variable is negative for an observation, then that outcome is considered not available for that observation. The predicted counts for that observation are defined only for the available observations and are based on the linear predictors for the available observations. The same set of coefficient parameter values are used for all observations. Any observation for which fewer than two outcomes are available is omitted.
Observations with missing data (NA
) in any outcome variable or
regressor are omitted (listwise deletion).
In a model that has the same regressors for every category, except for
one category for which there are no regressors in order to identify the
model (the reference category), the RHS
specification must be
given for all the categories except the reference category. The formula
for the reference category must include a RHS
specification that
explicitly omits the constant, e.g., countvar ~ -1
or
countvar ~ 0
. The number of coefficient parameters to be
estimated equals the number of terms generated by all the formulas,
subject to equality constraints that may be specified using the
equality
argument.
model
argument, which are the data to be analyzed.model
argument: parameters for the terms in the first formula
appear first, then come parameters for the terms in the second formula,
etc. In practice it will usually be better to start by letting
multinomRob find starting values by using the multinom.t
option,
then using the results from one run as starting values for a subsequent
run done with, perhaps, a larger population of operators for rgenoud. multinomRob(model=list(y1 ~ x1, y2 ~ x2, y3 ~ 0), data=dtf,
equality=list(list(y1 ~ x1 + 0, y2 ~ x2 + 0)) );
the model to be estimated is
list(y1 ~ x1, y2 ~ x2, y3 ~ 0)
and the coefficients of x1 and x2 are constrained equal by
equality=list(list(y1 ~ x1 + 0, y2 ~ x2 + 0))
In the equality formulas it is necessary to say + 0
so the
intercepts are not involved in the constraints. If a parameter occurs
in two different lists in the equality=
argument, then all the
parameters in the two lists are constrained to be equal to one
another. In the output this is described as consolidating the lists.
TRUE
means to iterate between LQD and tanh estimation steps until
either the algorithm converges, the number of iterations specified by the
maxiter
argument is reached, or if an LQD step occurs that
produces a larger value than the previous step did for the overdispersion
scale parameter. This option is often improves the fit of the model.1
means use the multinomial multivariate-t model to compute
starting values for the coefficient parameters. But if the MNL
results are better (as judged by the LQD fit), MNL values will be
used instead. 0
means use nonrobust maximum likelihood
estimates for a multinomial regression
model. 2
forces the use of the multivariate-t model for
starting values even if the MNL estimates provide better starting
values for the LQD. Note that with multinom.t=1
or multinom.t=2
,
multivariate-t
starting values will not be used if the model cannot generate valid
standard errors. To force the use of multivariate-t estimates even
in this circumstance, see the multinom.t.df
argument. If the starting.values
argument is not
NULL
, the starting values given in that argument are used and the multinom.t
argument is ignored. Multinomial multivariate-t starting values are
not available when the number
of outcome alternatives varies over the observations.
NA
means that the degrees of freedom (DF) for the multivariate-t
model (when used) should be estimated. If multinom.t.df
is a number,
that number will be used for the degrees of freedom and the DF will not be
estimated. Only a positive number should be used.
Setting multinom.t.df
to a number also implies that, if
multinom.t=1
or multinom.t=2
, the
multivariate-t starting values will be used (depending on the comparison with
the MNL estimates if multinom.t=1
is set) even if the standard
errors are not defined.
TRUE
, then only the standard maximum-likelihood MNL model
is estimated. No robust estimation model and no overdispersion
parameter is estimated.model
argument. The
name of each column is the name used for the count variable in the
corresponding formula. The label for each row of the matrix gives the
names of the regressors to which the coefficient values in the row
apply. The regressor names in each label are separated by a forward
slash (/), and NA
is used to denote that no regressor is
associated with the corresponding value in the matrix. The value 0 is
used in the matrix to fill in for values that do not correspond to a
model
formula regressor.coefficients
. The standard errors are derived from the estimated
asymptotic sandwich covariance estimate.model
argument. The first
column of the matrix has names for the observations, and the remaining
columns contain the weights. Each of the latter columns has a name
derived from the name of one of the count variables named in the
model
argument. If count1
is the name of the count
variable used in the first formula, then the second column in the matrix
is named weights:count1
, etc.If an observation has negative values specified for some outcome variables,
indicating that those outcome alternatives are not available for that
observation, then values of NA
appear in the weights matrix for that
observation, as many NA
values as there are unavailable
alternatives. The NA
values will be the last values in the affected
row of the weights matrix, regardless of which outcome alternatives were
unavailable for the observation.model
argument. The first
column of the matrix has names for the observations, and the remaining
columns contain the weights. Each of the latter columns has a name
derived from the name of one of the count variables named in the
model
argument. If count1
is the name of the count
variable used in the first formula, then the second column in the matrix
is named Hdiag:count1
, etc.If an observation has negative values specified for some outcome variables,
indicating that those outcome alternatives are not available for that
observation, then values of 0 appear in the weights matrix for that
observation, as many 0 values as there are unavailable alternatives. Values
of 0 that are created for this reason will be the last values in the affected
row of the weights matrix, regardless of which outcome alternatives were
unavailable for the observation.multinomMLE
.multinomT
.genoudRob
.mGNtanh
.mGNtanh
.If starting values are not supplied, they are computed using a multinomial multivariate-t model. The program also computes and reports nonrobust maximum likelihood estimates for the multinomial regression model, reporting sandwich estimates for the standard errors that are adjusted for a nonrobust estimate of the error dispersion.
For additional documentation please visit http://sekhon.berkeley.edu/robust/.
# make some multinomial data
x1 <- rnorm(50);
x2 <- rnorm(50);
p1 <- exp(x1)/(1+exp(x1)+exp(x2));
p2 <- exp(x2)/(1+exp(x1)+exp(x2));
p3 <- 1 - (p1 + p2);
y <- matrix(0, 50, 3);
for (i in 1:50) {
y[i,] <- rmultinomial(1000, c(p1[i], p2[i], p3[i]));
}
# perturb the first 5 observations
y[1:5,c(1,2,3)] <- y[1:5,c(3,1,2)];
y1 <- y[,1];
y2 <- y[,2];
y3 <- y[,3];
# put data into a dataframe
dtf <- data.frame(x1, x2, y1, y2, y3);
## Set parameters for Genoud
## Not run:
# ## For production, use these kinds of parameters
# zz.genoud.parms <- list( pop.size = 1000,
# wait.generations = 10,
# max.generations = 100,
# scale.domains = 5,
# print.level = 0
# )
# ## End(Not run)
## For testing, we are setting the parmeters to run quickly. Don't use these for production
zz.genoud.parms <- list( pop.size = 10,
wait.generations = 1,
max.generations = 1,
scale.domains = 5,
print.level = 0
)
# estimate a model, with "y3" being the reference category
# true coefficient values are: (Intercept) = 0, x = 1
# impose an equality constraint
# equality constraint: coefficients of x1 and x2 are equal
mulrobE <- multinomRob(list(y1 ~ x1, y2 ~ x2, y3 ~ 0),
dtf,
equality = list(list(y1 ~ x1 + 0, y2 ~ x2 + 0)),
genoud.parms = zz.genoud.parms,
print.level = 3, iter=FALSE);
summary(mulrobE, weights=TRUE);
#Do only MLE estimation. The following model is NOT identified if we
#try to estimate the overdispersed MNL.
dtf <- data.frame(y1=c(1,1),y2=c(2,1),y3=c(1,2),x=c(0,1))
summary(multinomRob(list(y1 ~ 0, y2 ~ x, y3 ~ x), data=dtf, MLEonly=TRUE))
Run the code above in your browser using DataLab