Last chance! 50% off unlimited learning
Sale ends in
gam
modelling function is designed to be able to use
the negbin
family (a modification of MASS library negative.binomial
family
by Venables and Ripley), or the nb
function designed for integrated estimation of
parameter theta
. $\theta$ is the parameter such that $var(y) = \mu + \mu^2/\theta$, where $\mu = E(y)$.Two approaches to estimating theta
are available (with gam
only):
negbin
then if `performance iteration' is used for smoothing parameter estimation
(see gam
), then smoothing parameters are chosen by GCV and
theta
is chosen in order to ensure that the Pearson estimate of the scale
parameter is as close as possible to 1, the value that the scale parameter should have.
nb
family then
theta
is estimated alongside the smoothing parameters by ML or REML.
To use the first option, set the optimizer
argument of gam
to "perf"
(it can sometimes fail to converge).
negbin(theta = stop("'theta' must be specified"), link = "log")
nb(theta = NULL, link = "log")
negbin
).
For nb
then a positive supplied theta
is treated as a fixed known parameter, otherwise
it is estimated (the absolute value of a negative theta
is taken as a starting value)."log"
, "identity"
or "sqrt"
negbin
an object inheriting from class family
, with additional elements
, with additional elementsFor nb
an object inheriting from class extended.family
.
nb
allows estimation of the theta
parameter alongside the model smoothing parameters, but is only useable with gam
(not bam
or gamm
).For negbin
, if a single value of theta
is supplied then it is always taken as the known fixed value and this is useable with bam
and gamm
. If theta
is two
numbers (theta[2]>theta[1]
) then they are taken as specifying the range of values over which to search for
the optimal theta. This option should only be used with performance iteration estimation (see gam
argument optimizer
), in which case the method
of estimation is to choose $theta$ so that the GCV (Pearson) estimate
of the scale parameter is one (since the scale parameter
is one for the negative binomial). In this case $theta$ estimation is nested within the IRLS loop
used for GAM fitting. After each call to fit an iteratively weighted additive model to the IRLS pseudodata,
the $theta$ estimate is updated. This is done by conditioning on all components of the current GCV/Pearson
estimator of the scale parameter except $theta$ and then searching for the
$theta$ which equates this conditional estimator to one. The search is
a simple bisection search after an initial crude line search to bracket one. The search will
terminate at the upper boundary of the search region is a Poisson fit would have yielded an estimated
scale parameter <1.< p="">
Venables, B. and B.R. Ripley (2002) Modern Applied Statistics in S, Springer.
library(mgcv)
set.seed(3)
n<-400
dat <- gamSim(1,n=n)
g <- exp(dat$f/5)
## negative binomial data...
dat$y <- rnbinom(g,size=3,mu=g)
## known theta fit ...
b0 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(3),data=dat)
plot(b0,pages=1)
print(b0)
## same with theta estimation...
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=nb(),data=dat)
plot(b,pages=1)
print(b)
b$family$getTheta(TRUE) ## extract final theta estimate
## unknown theta via performance iteration...
b1 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(c(1,10)),
optimizer="perf",data=dat)
plot(b1,pages=1)
print(b1)
## another example...
set.seed(1)
f <- dat$f
f <- f - min(f)+5;g <- f^2/10
dat$y <- rnbinom(g,size=3,mu=g)
b2 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=nb(link="sqrt"),
data=dat,method="REML")
plot(b2,pages=1)
print(b2)
rm(dat)
Run the code above in your browser using DataLab