Last chance! 50% off unlimited learning
Sale ends in
For maximal numerical stability the method is based on orthogonal decomposition methods, and attempts to deal with numerical rank deficiency gracefully using a truncated singular value decomposition approach.
magic(y,X,sp,S,off,L=NULL,lsp0=NULL,rank=NULL,H=NULL,C=NULL,
w=NULL,gamma=1,scale=1,gcv=TRUE,ridge.parameter=NULL,
control=list(maxit=50,tol=1e-6,step.half=25,
rank.tol=.Machine$double.eps^0.5),extra.rss=0,n.score=length(y))
L%*%log(sp)
+ lsp0
contains the logs of the smoothing parameters that actually multiply the penalty matrices stored in
S
(L
is taken as the identity if NULL
S[[i]]
is the ith penalty matrix, but note
that it is not stored as a full matrix, but rather as the smallest square matrix including all
the non-zero elements of the penalty matrix. Element 1,1 of S[[
S[[i]]
.log(sp)
to the log smoothing parameters that actually multiply the
penalties defined by the elemts of S
. Taken as the identity, if NULL
. See above under sp
.L
is not NULL
this is a vector of constants in
the linear transformation from log(sp)
to the actual log smoothing
parameters. So the logs of the smoothing parameters multiplying the
S[[i]]
are given by
, specifically
${\bf V}_y^{-1} = {\bf w}^\prime{\bf w}$. If w
is an array then
it is taken as thTRUE
if GCV is to be used, FALSE
for UBRE.[object Object],[object Object],[object Object],[object Object]
n.score
, this is useful for certain methods for dealing with very large data
setS
(same as sp
if L
was NULL
). This is exp(L%*%log(sp))
.b
is given by rV%*%t(rV)*scale
.magic.post.proc
.The $\theta_i$ are chosen to minimize either the GCV score:
or the UBRE score:
where $\gamma$ is gamma
the inflation factor for degrees of freedom (usually set to 1) and $\phi$
is scale
, the scale parameter. $\bf A$ is the hat matrix (influence matrix) for the fitting problem (i.e
the matrix mapping data to fitted values). Dependence of the scores on the smoothing parameters is through $\bf A$.
The method operates by Newton or steepest descent updates of the logs of the $\theta_i$. A key aspect of the method is stable and economical calculation of the first and second derivatives of the scores w.r.t. the log smoothing parameters. Because the GCV/UBRE scores are flat w.r.t. very large or very small $\theta_i$, it's important to get good starting parameters, and to be careful not to step into a flat region of the smoothing parameter space. For this reason the algorithm rescales any Newton step that would result in a $log(\theta_i)$ change of more than 5. Newton steps are only used if the Hessian of the GCV/UBRE is postive definite, otherwise steepest descent is used. Similarly steepest descent is used if the Newton step has to be contracted too far (indicating that the quadratic model underlying Newton is poor). All initial steepest descent steps are scaled so that their largest component is 1. However a step is calculated, it is never expanded if it is successful (to avoid flat portions of the objective), but steps are successively halved if they do not decrease the GCV/UBRE score, until they do, or the direction is deemed to have failed. (Given the smoothing parameters the optimal $\bf b$ parameters are easily found.)
The method is coded in C
with matrix factorizations performed using LINPACK and LAPACK routines.
magic.post.proc
,
mgcv
,gam
## Use `magic' for a standard additive model fit ...
library(mgcv)
set.seed(1);n <- 400;sig <- 2
dat <- gamSim(1,n=n,scale=sig)
## set up additive model
G <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),fit=FALSE,data=dat)
## fit using magic
mgfit <- magic(G$y,G$X,G$sp,G$S,G$off,rank=G$rank,C=G$C)
## and fit using gam as consistency check
b <- gam(G=G)
mgfit$sp;b$sp # compare smoothing parameter estimates
edf <- magic.post.proc(G$X,mgfit,G$w)$edf # get e.d.f. per param
range(edf-b$edf) # compare
## constrain first two smooths to have identical smoothing parameters
L <- diag(3);L <- rbind(L[1,],L)
mgfit <- magic(G$y,G$X,rep(-1,3),G$S,G$off,L=L,rank=G$rank,C=G$C)
## Now a correlated data example ...
library(nlme)
## simulate truth
set.seed(1);n<-400;sig<-2
x <- 0:(n-1)/(n-1)
f <- 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
## produce scaled covariance matrix for AR1 errors...
V <- corMatrix(Initialize(corAR1(.6),data.frame(x=x)))
Cv <- chol(V) # t(Cv)%*%Cv=V
## Simulate AR1 errors ...
e <- t(Cv)%*%rnorm(n,0,sig) # so cov(e) = V * sig^2
## Observe truth + AR1 errors
y <- f + e
## GAM ignoring correlation
par(mfrow=c(1,2))
b <- gam(y~s(x,k=20))
plot(b);lines(x,f-mean(f),col=2);title("Ignoring correlation")
## Fit smooth, taking account of *known* correlation...
w <- solve(t(Cv)) # V^{-1} = w'w
## Use `gam' to set up model for fitting...
G <- gam(y~s(x,k=20),fit=FALSE)
## fit using magic, with weight *matrix*
mgfit <- magic(G$y,G$X,G$sp,G$S,G$off,rank=G$rank,C=G$C,w=w)
## Modify previous gam object using new fit, for plotting...
mg.stuff <- magic.post.proc(G$X,mgfit,w)
b$edf <- mg.stuff$edf;b$Vp <- mg.stuff$Vb
b$coefficients <- mgfit$b
plot(b);lines(x,f-mean(f),col=2);title("Known correlation")
Run the code above in your browser using DataLab