nb allows estimation of the theta parameter alongside the model smoothing parameters, but is only useable with gam (not bam or gam).For negbin, if a single value of theta is supplied then it is always taken as the known fixed value and this is useable with bam and gamm. If theta is two 
numbers (theta[2]>theta[1]) then they are taken as specifying the range of values over which to search for 
the optimal theta. This option should only be used with performance iteration estimation (see gam argument optimizer), in which case  the method 
of estimation is to choose $\hat \theta$  so that the GCV (Pearson) estimate 
of the scale parameter is one (since the scale parameter 
is one for the negative binomial). In this case $\theta$ estimation is nested within the IRLS loop 
used for GAM fitting. After each call to fit an iteratively weighted additive model to the IRLS pseudodata, 
the $\theta$ estimate is updated. This is done by conditioning on all components of the current GCV/Pearson 
estimator of the scale parameter except $\theta$ and then searching for the 
$\hat \theta$ which equates this conditional  estimator to one. The search is 
a simple bisection search after an initial crude line search to bracket one. The search will 
terminate at the upper boundary of the search region is a Poisson fit would have yielded an estimated 
scale parameter <1.< p="">
The following negbin based approaches are now deprecated:
If outer iteration is used then $\theta$ is estimated by searching for the value yielding the lowest AIC.
The search is either over the supplied array of values, or is a grid search over the supplied range, followed by a 
golden section search. A full fit is required for each trial $\theta$, so the process is slow, but speed
is enhanced by making the changes in $\theta$ as small as possible, from one step to the next, and 
using the previous smothing parameter and fitted values to start the new fit. 
In a simulation test based on 800 replicates of
the first example data, given below, the GCV based (performance iteration) method yielded models with, 
on avergage 6% better MSE performance than the AIC based (outer iteration) method. 
theta had a 0.86 correlation coefficient between the two methods.
theta estimates averaged 3.36 with a standard deviation of 0.44 for the AIC based method and
3.22 with a standard deviation of 0.43 for the GCV based method. However the GCV based method is 
less computationally reliable, failing in around 4% of replicates.
1.<>