gam
, except that the numerical methods
are designed for datasets containing upwards of several tens of thousands of data (see Wood, Goude and Shaw, 2015). The advantage
of bam
is much lower memory footprint than gam
, but it can also be much faster,
for large datasets. bam
can also compute on a cluster set up by the parallel package.An alternative fitting approach is provided by the discrete==TRUE
method. In this case a method based on discretization of
covariate values and C code level parallelization (controlled by the nthreads
argument instead of the cluster
argument) is used. This extends both the data set and model size usable.
bam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL, na.action=na.omit, offset=NULL,method="fREML",control=list(), select=FALSE,scale=0,gamma=1,knots=NULL,sp=NULL,min.sp=NULL, paraPen=NULL,chunk.size=10000,rho=0,AR.start=NULL,discrete=FALSE, cluster=NULL,nthreads=NA,gc.level=1,use.chol=FALSE,samfrac=1, coef=NULL,drop.unused.levels=TRUE,G=NULL,fit=TRUE,drop.intercept=NULL,...)
formula.gam
and also gam.models
).
This is exactly like the formula for a GLM except that smooth terms, s
and te
can be added
to the right hand side to specify that the linear predictor depends on smooth functions of predictors
(or linear functionals of these).
environment(formula)
: typically the environment from
which gam
is called.weights <- weights/mean(weights)
).formula
(this used to conform to the behaviour of
lm
and glm
)."GCV.Cp"
to use GCV for unknown scale parameter and
Mallows' Cp/UBRE/AIC for known scale. "GACV.Cp"
is equivalent, but using GACV in place of GCV. "REML"
for REML estimation, including of unknown scale, "P-REML"
for REML estimation, but using a Pearson estimate
of the scale. "ML"
and "P-ML"
are similar, but using maximum likelihood in place of REML. Default
"fREML"
uses fast REML computation.gam.control
. Any control parameters not supplied stay at their default values.k
value
supplied (note that the number of knots is not always just k
).
See tprs
for what happens in the "tp"/"ts"
case.
Different terms can use different numbers of knots, unless they share a covariate.
length(sp)
must correspond to the number of underlying smoothing parameters.full.sp
, in the
returned object, will need to be added to what is supplied here to get the
smoothing parameters actually multiplying the penalties. length(min.sp)
should
always be the same as the total number of penalties (so it may be longer than sp
,
if smooths share smoothing parameters).gam.models
explains more.4*p
if chunk.size < 4*p
where p
is the number of coefficients.std.rsd
if non zero. TRUE
at first observation of an independent
section of AR1 correlation. Very first observation in data frame does not need this. If NULL
then
there are no breaks in AR1 correlaion.method="fREML"
it is possible to discretize covariates for storage and efficiency reasons.
If discrete
is TRUE
, a number or a vector of numbers for each smoother term, then discretization happens. If numbers are supplied they give the number of discretization bins. Experimental at present. bam
can compute the computationally dominant QR decomposition in parallel using parLapply
from the parallel
package, if it is supplied with a cluster on which to do this (a cluster here can be some cores of a
single machine). See details and example code.
NA
set to max(1,length(cluster))
.bam
uses a very stable QR update approach to obtaining the QR decomposition
of the model matrix. For well conditioned models an alternative accumulates the crossproduct of the model matrix
and then finds its Choleski decomposition, at the end. This is somewhat more efficient, computationally.samfrac
is the sampling fraction to use. 0.1 is often reasonable. NULL
then this should be the object returned by a previous call to bam
with
fit=FALSE
. Causes all other arguments to be ignored except chunk.size
, gamma
,nthreads
, cluster
, rho
, gc.level
, samfrac
, use.chol
and method
.FALSE
then the model is set up for fitting but not estimated, and an object is returned, suitable for passing as the G
argument to bam
.TRUE
to force the model to really not have the a constant in the parametric model part,
even with factor variables present.gam.fit
(such as mustart
). "gam"
as described in gamObject
.
"tp"
basis is used. Unless discrete=TRUE, you must have more unique combinations of covariates than the model has total
parameters. (Total parameters is sum of basis dimensions plus sum of non-spline
terms less the number of spline terms). This routine is less stable than `gam' for the same dataset. The negbin family is only supported for the *known theta* case. discrete=FALSE
, bam
operates by first setting up the basis characteristics for the smooths, using a representative subsample of the data. Then the model matrix is constructed in blocks using predict.gam
. For each block the factor R,
from the QR decomposition of the whole model matrix is updated, along with Q'y. and the sum of squares of y. At the end of
block processing, fitting takes place, without the need to ever form the whole model matrix. In the generalized case, the same trick is used with the weighted model matrix and weighted pseudodata, at each step of the PIRLS. Smoothness selection is performed on the working model at each stage (performance oriented iteration), to maintain the small memory footprint. This is trivial to justify in the case of GCV or Cp/UBRE/AIC based model selection, and for REML/ML is justified via the asymptotic multivariate normality of Q'z where z is the IRLS pseudodata.
For full method details see Wood, Goude and Shaw (2015).
Note that POI is not as stable as the default nested iteration used with gam
, but that for very large, information rich,
datasets, this is unlikely to matter much.
Note also that it is possible to spend most of the computational time on basis evaluation, if an expensive basis is used. In practice this means that the default "tp"
basis should be avoided: almost any other basis (e.g. "cr"
or "ps"
)
can be used in the 1D case, and tensor product smooths (te
) are typically much less costly in the multi-dimensional case.
If cluster
is provided as a cluster set up using makeCluster
(or makeForkCluster
) from the parallel
package, then the rate limiting QR decomposition of the model matrix is performed in parallel using this cluster. Note that the speed ups are often not that great. On a multi-core machine it is usually best to set the cluster size to the number of physical cores, which is often less than what is reported by detectCores
. Using more than the number of physical cores can result in no speed up at all (or even a slow down). Note that a highly parallel BLAS may negate all advantage from using a cluster of cores. Computing in parallel of course requires more memory than computing in series. See examples.
When discrete=TRUE
the covariate data are first discretized. Discretization takes place on a smooth by smooth basis, or in the case of tensor product smooths (or any smooth that can be represented as such, such as random effects), separately for each marginal smooth. The required spline bases are then evaluated at the discrete values, and stored, along with index vectors indicating which original observation they relate to. Fitting is by a version of performance oriented iteration/PQL using REML smoothing parameter selection on each iterative working model (as for the default method). The iteration is based on the derivatives of the REML score, without computing the score itself, allowing the expensive computations to be reduced to one parallel block Cholesky decomposition per iteration (plus two basic operations of equal cost, but easily parallelized). Unlike standard POI/PQL, only one step of the smoothing parameter update for the working model is taken at each step (rather than iterating to the optimal set of smoothing parameters for each working model). At each step a weighted model matrix crossproduct of the model matrix is required - this is efficiently computed from the pre-computed basis functions evaluated at the discretized covariate values. Efficient computation with tensor product terms means that some terms within a tensor product may be re-ordered for maximum efficiency. Parallel computation is controlled using the nthreads
argument. For this method no cluster computation is used, and the parallel
package is not required.
Wood, S.N., Li, Z., Shaddick, G. & Augustin N.H. (2016) Generalized additive models for gigadata: modelling the UK black smoke network daily data. Journal of the American Statistical Association.
mgcv.parallel
,
mgcv-package
, gamObject
, gam.models
, smooth.terms
,
linear.functional.terms
, s
,
te
predict.gam
,
plot.gam
, summary.gam
, gam.side
,
gam.selection
, gam.control
gam.check
, linear.functional.terms
negbin
, magic
,vis.gam
library(mgcv)
## See help("mgcv-parallel") for using bam in parallel
## Some examples are marked 'Not run' purely to keep
## checking load on CRAN down. Sample sizes are small for
## the same reason.
set.seed(3)
dat <- gamSim(1,n=25000,dist="normal",scale=20)
bs <- "cr";k <- 12
b <- bam(y ~ s(x0,bs=bs)+s(x1,bs=bs)+s(x2,bs=bs,k=k)+
s(x3,bs=bs),data=dat)
summary(b)
plot(b,pages=1,rug=FALSE) ## plot smooths, but not rug
plot(b,pages=1,rug=FALSE,seWithMean=TRUE) ## `with intercept' CIs
## Not run:
# ba <- bam(y ~ s(x0,bs=bs,k=k)+s(x1,bs=bs,k=k)+s(x2,bs=bs,k=k)+
# s(x3,bs=bs,k=k),data=dat,method="GCV.Cp") ## use GCV
# summary(ba)## End(Not run)
## A Poisson example...
k <- 15
dat <- gamSim(1,n=21000,dist="poisson",scale=.1)
system.time(b1 <- bam(y ~ s(x0,bs=bs)+s(x1,bs=bs)+s(x2,bs=bs,k=k),
data=dat,family=poisson()))
b1
## Sparse smoother example (deprecated)...
## Not run:
# dat <- gamSim(1,n=10000,dist="poisson",scale=.1)
# system.time( b3 <- bam(y ~ te(x0,x1,bs="ps",k=10,np=FALSE)+
# s(x2,bs="ps",k=30)+s(x3,bs="ps",k=30),data=dat,
# method="REML",family=poisson(),sparse=TRUE))
# b3## End(Not run)
Run the code above in your browser using DataLab