Fits a generally--altered, --inflated --truncated and deflated negative binomial regression by MLE. The GAITD combo model having 7 types of special values is implemented. This allows mixtures of negative binomial distributions on nested and/or partitioned support as well as a multinomial logit model for (nonparametric) altered, inflated and deflated values.
gaitdnbinomial(a.mix = NULL, i.mix = NULL, d.mix = NULL,
a.mlm = NULL, i.mlm = NULL, d.mlm = NULL,
truncate = NULL, zero = c("size", "pobs", "pstr", "pdip"),
eq.ap = TRUE, eq.ip = TRUE, eq.dp = TRUE,
parallel.a = FALSE, parallel.i = FALSE, parallel.d = FALSE,
lmunb.p = "loglink",
lmunb.a = lmunb.p, lmunb.i = lmunb.p, lmunb.d = lmunb.p,
lsize.p = "loglink",
lsize.a = lsize.p, lsize.i = lsize.p, lsize.d = lsize.p,
type.fitted = c("mean", "munbs", "sizes", "pobs.mlm",
"pstr.mlm", "pdip.mlm", "pobs.mix", "pstr.mix", "pdip.mix",
"Pobs.mix", "Pstr.mix", "Pdip.mix", "nonspecial", "Numer",
"Denom.p", "sum.mlm.i", "sum.mix.i",
"sum.mlm.d", "sum.mix.d", "ptrunc.p", "cdf.max.s"),
gpstr.mix = ppoints(7) / 3,
gpstr.mlm = ppoints(7) / (3 + length(i.mlm)),
imethod = 1, mux.init = c(0.75, 0.5, 0.75, 0.5),
imunb.p = NULL, imunb.a = imunb.p,
imunb.i = imunb.p, imunb.d = imunb.p,
isize.p = NULL, isize.a = isize.p,
isize.i = isize.p, isize.d = isize.p,
ipobs.mix = NULL, ipstr.mix = NULL,
ipdip.mix = NULL, ipobs.mlm = NULL,
ipstr.mlm = NULL, ipdip.mlm = NULL,
byrow.aid = FALSE, ishrinkage = 0.95, probs.y = 0.35,
nsimEIM = 500, cutoff.prob = 0.999, eps.trig = 1e-7,
nbd.max.support = 4000, max.chunk.MB = 30)
See gaitdpoisson
.
See gaitdpoisson
.
See gaitdpoisson
.
Link functions pertaining to the mean parameters.
See gaitdpoisson
where llambda.p
etc. are
the equivalent.
Link functions pertaining to the size
parameters.
See NegBinomial
.
See gaitdpoisson
.
These apply to both munb
and size
parameters
simultaneously.
See NegBinomial
also.
See gaitdpoisson
.
See gaitdpoisson
.
See gaitdpoisson
.
See gaitdpoisson
and
CommonVGAMffArguments
.
See gaitdpoisson
.
Numeric, of length 4.
General downward multiplier for initial values for
the sample proportions (MLEs actually).
See gaitdpoisson
.
The fourth value corresponds to size
.
See gaitdpoisson
;
imunb.p
is similar to ilambda.p
, etc.
See gaitdpoisson
;
isize.p
is similar to ilambda.p
, etc.
See CommonVGAMffArguments
for information.
Details are at Gaitdpois
.
See gaitdpoisson
and
CommonVGAMffArguments
.
See negbinomial
.
See negbinomial
.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions such as vglm
,
rrvglm
and vgam
.
The fitted.values
slot of the fitted object,
which should be extracted by the generic function fitted
,
returns the mean type.fitted
.
See gaitdpoisson
.
Also, having eq.ap = TRUE
, eq.ip = TRUE
and eq.dp = TRUE
is often needed to obtain
initial values that are good enough because they borrow
strength across the different operators.
It is usually easy to relax these assumptions later.
This family function is under constant development and future changes will occur.
The GAITD--NB combo model is the pinnacle of GAITD regression
for counts because it potentially handles
underdispersion,
equidispersion and
overdispersion relative to the Poisson,
as well as
alteration,
inflation,
deflation and
truncation at arbitrary support points.
In contrast, gaitdpoisson
cannot handle
overdispersion so well.
The GAITD--NB is so flexible that it can accommodate up to
seven modes.
The full
GAITD--NB--NB--MLM--NB-MLM--NB-MLM combo model
may be fitted with this family function.
There are seven types of special values and all arguments for these
may be used in a single model.
Here, the MLM represents the nonparametric while the NB
refers to the negative binomial mixtures.
The defaults for this function correspond to an
ordinary negative binomial
regression so that negbinomial
is called instead.
While much of the documentation here draws upon
gaitdpoisson
, there are additional
details here because the NBD is a two parameter
distribution that handles overdispersion relative
to the Possion.
Consequently, this family function is exceeding flexible
and there are many more pitfalls to avoid.
The order of the linear/additive predictors is best explained by
an example.
Suppose a combo model has
length(a.mix) > 3
and
length(i.mix) > 3
,
length(d.mix) > 3
,
a.mlm = 3:5
,
i.mlm = 6:9
and
d.mlm = 10:12
, say.
Then loglink(munb.p)
and loglink(size.p)
are the first two.
The third is multilogitlink(pobs.mix)
followed
by loglink(munb.a)
and loglink(size.a)
because a.mix
is long enough.
The sixth is multilogitlink(pstr.mix)
followed
by loglink(munb.i)
and loglink(size.i)
because i.mix
is long enough.
The ninth is multilogitlink(pdip.mix)
followed
by loglink(munb.d)
and loglink(size.d)
because d.mix
is long enough.
Next are the probabilities for the a.mlm
values.
Then are the probabilities for the i.mlm
values.
Lastly are the probabilities for the d.mlm
values.
All the probabilities are estimated by one big MLM and effectively
the "(Others)"
column of left over probabilities is
associated with the nonspecial values.
These might be called the
nonspecial baseline probabilities (NBP)
or reserve probabilities.
The dimension of the vector of linear/additive predictors here
is
Apart from the order of the linear/additive predictors,
the following are (or should be) equivalent:
gaitdnbinomial()
and negbinomial()
,
gaitdnbinomial(a.mix = 0)
and zanegbinomial(zero = "pobs0")
,
gaitdnbinomial(i.mix = 0)
and zinegbinomial(zero = "pstr0")
,
gaitdnbinomial(truncate = 0)
and posnegbinomial()
.
Likewise, if
a.mix
and i.mix
are assigned a scalar then
it effectively moves that scalar to a.mlm
and i.mlm
because there is no
parameters such as munb.i
being estimated.
Thus
gaitdnbinomial(a.mix = 0)
and gaitdnbinomial(a.mlm = 0)
are the effectively same, and ditto for
gaitdnbinomial(i.mix = 0)
and gaitdnbinomial(i.mlm = 0)
.
Yee, T. W. and Ma, C. (2022). Generally--altered, --inflated, --truncated and --deflated regression, with application to heaped and seeped data. In preparation.
Gaitdnbinom
,
multinomial
,
rootogram4
,
specials
,
plotdgaitd
,
spikeplot
,
meangaitd
,
KLD
,
gaitdpoisson
,
gaitdlog
,
gaitdzeta
,
multilogitlink
,
multinomial
,
goffset
,
Trunc
,
negbinomial
,
CommonVGAMffArguments
,
simulate.vlm
.
# NOT RUN {
i.mix <- c(5, 10, 12, 16) # Inflate these values parametrically
i.mlm <- c(14, 15) # Inflate these values
a.mix <- c(1, 6, 13, 20) # Alter these values
tvec <- c(3, 11) # Truncate these values
pstr.mlm <- 0.1 # So parallel.i = TRUE
pobs.mix <- pstr.mix <- 0.1; set.seed(1)
gdata <- data.frame(x2 = runif(nn <- 1000))
gdata <- transform(gdata, munb.p = exp(2 + 0.0 * x2), size.p = exp(1))
gdata <- transform(gdata,
y1 = rgaitdnbinom(nn, size.p, munb.p, a.mix = a.mix, i.mix = i.mix,
pobs.mix = pobs.mix, pstr.mix = pstr.mix,
i.mlm = i.mlm, pstr.mlm = pstr.mlm,
truncate = tvec))
gaitdnbinomial(a.mix = a.mix, i.mix = i.mix, i.mlm = i.mlm)
with(gdata, table(y1))
fit1 <- vglm(y1 ~ 1, crit = "coef", trace = TRUE, data = gdata,
gaitdnbinomial(a.mix = a.mix, i.mix = i.mix, i.mlm = i.mlm,
parallel.i = TRUE, eq.ap = TRUE,
eq.ip = TRUE, truncate = tvec))
head(fitted(fit1, type.fitted = "Pstr.mix"))
head(predict(fit1))
t(coef(fit1, matrix = TRUE)) # Easier to see with t()
summary(fit1)
# }
# NOT RUN {
spikeplot(with(gdata, y1), lwd = 2)
plotdgaitd(fit1, new.plot = FALSE, offset.x = 0.2, all.lwd = 2)
# }
Run the code above in your browser using DataLab