Learn R Programming

mgcv (version 1.9-4)

magic: Stable Multiple Smoothing Parameter Estimation by GCV or UBRE

Description

Function to efficiently estimate smoothing parameters in generalized ridge regression problems with multiple (quadratic) penalties, by GCV or UBRE. The function uses Newton's method in multi-dimensions, backed up by steepest descent to iteratively adjust the smoothing parameters for each penalty (one penalty may have a smoothing parameter fixed at 1).

For maximal numerical stability the method is based on orthogonal decomposition methods, and attempts to deal with numerical rank deficiency gracefully using a truncated singular value decomposition approach.

Usage

magic(y,X,sp,S,off,L=NULL,lsp0=NULL,rank=NULL,H=NULL,C=NULL,
      w=NULL,gamma=1,scale=1,gcv=TRUE,ridge.parameter=NULL,
      control=list(tol=1e-6,step.half=25,rank.tol=
      .Machine$double.eps^0.5),extra.rss=0,n.score=length(y),nthreads=1)

Arguments

Value

The function returns a list with the following items:

b

The best fit parameters given the estimated smoothing parameters.

scale

the estimated (GCV) or supplied (UBRE) scale parameter.

score

the minimized GCV or UBRE score.

sp

an array of the estimated smoothing parameters.

sp.full

an array of the smoothing parameters that actually multiply the elements of S (same as sp if L was NULL). This is exp(L%*%log(sp)).

rV

a factored form of the parameter covariance matrix. The (Bayesian) covariance matrix of the parametes b is given by rV%*%t(rV)*scale.

gcv.info

is a list of information about the performance of the method with the following elements:

full.rank

The apparent rank of the problem: number of parameters less number of equality constraints.

rank

The estimated actual rank of the problem (at the final iteration of the method).

fully.converged

is TRUE if the method converged by satisfying the convergence criteria, and FALSE if it coverged by failing to decrease the score along the search direction.

hess.pos.def

is TRUE if the hessian of the UBRE or GCV score was positive definite at convergence.

iter

is the number of Newton/Steepest descent iterations taken.

score.calls

is the number of times that the GCV/UBRE score had to be evaluated.

rms.grad

is the root mean square of the gradient of the UBRE/GCV score w.r.t. the smoothing parameters.

R

The factor R from the QR decomposition of the weighted model matrix. This is un-pivoted so that column order corresponds to X. So it may not be upper triangular.

Note that some further useful quantities can be obtained using magic.post.proc.

Details

The method is a computationally efficient means of applying GCV or UBRE (often approximately AIC) to the problem of smoothing parameter selection in generalized ridge regression problems of the form: $$ minimise~ \| { \bf W} ({ \bf Xb - y} ) \|^2 + {\bf b}^\prime {\bf Hb} + \sum_{i=1}^m \theta_i {\bf b^\prime S}_i{\bf b} $$ possibly subject to constraints \( {\bf Cb}={\bf 0}\). \( {\bf X}\) is a design matrix, \(\bf b\) a parameter vector, \(\bf y\) a data vector, \(\bf W\) a weight matrix, \( {\bf S}_i\) a positive semi-definite matrix of coefficients defining the ith penalty with associated smoothing parameter \(\theta_i\), \(\bf H\) is the positive semi-definite offset penalty matrix and \(\bf C\) a matrix of coefficients defining any linear equality constraints on the problem. \( {\bf X}\) need not be of full column rank.

The \(\theta_i\) are chosen to minimize either the GCV score:

$$V_g = \frac{n\|{\bf W}({\bf y} - {\bf Ay})\|^2}{[tr({\bf I} - \gamma {\bf A})]^2}$$

or the UBRE score:

$$V_u=\|{\bf W}({\bf y}-{\bf Ay})\|^2/n-2 \phi tr({\bf I}-\gamma {\bf A})/n + \phi$$

where \(\gamma\) is gamma the inflation factor for degrees of freedom (usually set to 1) and \(\phi\) is scale, the scale parameter. \(\bf A\) is the hat matrix (influence matrix) for the fitting problem (i.e the matrix mapping data to fitted values). Dependence of the scores on the smoothing parameters is through \(\bf A\).

The method operates by Newton or steepest descent updates of the logs of the \(\theta_i\). A key aspect of the method is stable and economical calculation of the first and second derivatives of the scores w.r.t. the log smoothing parameters. Because the GCV/UBRE scores are flat w.r.t. very large or very small \(\theta_i\), it's important to get good starting parameters, and to be careful not to step into a flat region of the smoothing parameter space. For this reason the algorithm rescales any Newton step that would result in a \(log(\theta_i)\) change of more than 5. Newton steps are only used if the Hessian of the GCV/UBRE is postive definite, otherwise steepest descent is used. Similarly steepest descent is used if the Newton step has to be contracted too far (indicating that the quadratic model underlying Newton is poor). All initial steepest descent steps are scaled so that their largest component is 1. However a step is calculated, it is never expanded if it is successful (to avoid flat portions of the objective), but steps are successively halved if they do not decrease the GCV/UBRE score, until they do, or the direction is deemed to have failed. (Given the smoothing parameters the optimal \(\bf b\) parameters are easily found.)

The method is coded in C with matrix factorizations performed using LINPACK and LAPACK routines.

References

Wood, S.N. (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Ass. 99:673-686

See Also

magic.post.proc,gam

Examples

Run this code
## Use `magic' for a standard additive model fit ... 
   library(mgcv)
   set.seed(1);n <- 200;sig <- 1
   dat <- gamSim(1,n=n,scale=sig)
   k <- 30
## set up additive model
   G <- gam(y~s(x0,k=k)+s(x1,k=k)+s(x2,k=k)+s(x3,k=k),fit=FALSE,data=dat)
## fit using magic (and gam default tolerance)
   mgfit <- magic(G$y,G$X,G$sp,G$S,G$off,rank=G$rank,
                  control=list(tol=1e-7,step.half=15))
## and fit using gam as consistency check
   b <- gam(G=G)
   mgfit$sp;b$sp  # compare smoothing parameter estimates
   edf <- magic.post.proc(G$X,mgfit,G$w)$edf # get e.d.f. per param
   range(edf-b$edf)  # compare

## p>n example... fit model to first 100 data only, so more
## params than data...

   mgfit <- magic(G$y[1:100],G$X[1:100,],G$sp,G$S,G$off,rank=G$rank)
   edf <- magic.post.proc(G$X[1:100,],mgfit,G$w[1:100])$edf

## constrain first two smooths to have identical smoothing parameters
   L <- diag(3);L <- rbind(L[1,],L)
   mgfit <- magic(G$y,G$X,rep(-1,3),G$S,G$off,L=L,rank=G$rank,C=G$C)

## Now a correlated data example ... 
    library(nlme)
## simulate truth
    set.seed(1);n<-400;sig<-2
    x <- 0:(n-1)/(n-1)
    f <- 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
## produce scaled covariance matrix for AR1 errors...
    V <- corMatrix(Initialize(corAR1(.6),data.frame(x=x)))
    Cv <- chol(V)  # t(Cv)%*%Cv=V
## Simulate AR1 errors ...
    e <- t(Cv)%*%rnorm(n,0,sig) # so cov(e) = V * sig^2
## Observe truth + AR1 errors
    y <- f + e 
## GAM ignoring correlation
    par(mfrow=c(1,2))
    b <- gam(y~s(x,k=20))
    plot(b);lines(x,f-mean(f),col=2);title("Ignoring correlation")
## Fit smooth, taking account of *known* correlation...
    w <- solve(t(Cv)) # V^{-1} = w'w
    ## Use `gam' to set up model for fitting...
    G <- gam(y~s(x,k=20),fit=FALSE)
    ## fit using magic, with weight *matrix*
    mgfit <- magic(G$y,G$X,G$sp,G$S,G$off,rank=G$rank,C=G$C,w=w)
## Modify previous gam object using new fit, for plotting...    
    mg.stuff <- magic.post.proc(G$X,mgfit,w)
    b$edf <- mg.stuff$edf;b$Vp <- mg.stuff$Vb
    b$coefficients <- mgfit$b 
    plot(b);lines(x,f-mean(f),col=2);title("Known correlation")

Run the code above in your browser using DataLab