GAMsetup: Set up GAM using penalized cubic regression splines

Description

Sets up design matrix $\bf X$, penalty matrices ${\bf S}_i$ and linear equality constraint matrix $\bf C$ for a GAM defined in terms of penalized regression splines, as well as returning the locations of the knots of these regression splines xp[][]. The output is such that the model can be fitted and smoothing parameters estimated by the method of Wood (2000) as implemented in routine mgcv(). This routine is largely superceded by gam.

Usage

GAMsetup(G)

Arguments

is the single argument to this function: it is a list containing several elements listed below:

the number of smooth terms in the model

the number of data to be modelled

nsdf

the number of user supplied columns of the design matrix for any parametric model parts

an array of G$m integers specifying the maximum d.f. for each spline term.

dim

An array of dimensions for the smooths. dim[i] is the number of covariates that smooth i is a function of.

s.type

An array giving the type of basis used for each term. 0 for cubic regression spline, 1 for t.p.r.s

p.order

An array giving the order of the penalty for each term. 0 for auto selection.

an array of G$n element arrays of data and (optionally) design matrix columns. The first G$nsdf elements of G$x should contain the elements of the columns of the design matrix corresponding to the parametric part o

Value

A list H, containing the elements of G (the input list) plus the following:
Xthe full design matrix.
SA one dimensional array containing the non-zero elements of the penalty matrices. Let start[k+1]<-start[k]+H$df[1:(k-1)]^2 and start[1]<-0. Then penalty matrix k has H$S[start[k]+i+H$df[i]*(j-1) on its ith row and jth column. To get the kth full penalty matrix the matrix so obtained would be inserted into a full matrix of zeroes with it's 1,1 element at H$off[k],H$off[k].
offis an array of offsets, used to facilitate efficient storage of the penalty matrices and to indicate where in the overall parameter vector the parameters of the ith spline reside (e.g. first parameter of ith spline is at p[off[i]+1]).
Ca matrix defining the linear equality constraints on the parameters used to define the the model (i.e. $\bf C$ in ${\bf Cp } ={\bf 0}$).
UZArray containing matrices, which transform from a t.p.r.s. basis to the equivalent t.p.s. basis (for t.p.r.s. terms only). The packing method is as follows: set start[1]<-0 and start[k+1]<-start[k]+(M[k]+n)*tp.bs[k] where n is number of data, M[k] is penalty null space dimension and tp.bs[k] is zero for a cubic regression spline and the basis dimension for a t.p.r.s. Then element i,j of the UZ matrix for model term k is UZ[start[k]+i+(j=1)*(M[k]+n)].
XuSet of unique covariate combinations for each term. The packing method is as follows: set start[1]<-0 and start[k+1]<-start[k]+(xu.length[k])*tp.dim[k] where xu.length[k] is number of unique covariate combinations and tp.dim[k] is zero for a cubic regression spline and the dimension of the smooth (i.e. number of covariates it is a function of) for a t.p.r.s. Then element i,j of the Xu matrix for model term k is Xu[start[k]+i+(j=1)*(xu.length[k])].
xu.lengthNumber of unique covariate combinations for each t.p.r.s. term.
covariate.shiftAll covariates are centred around zero before bases are constructed - this is an array of the applied shifts.
xpmatrix whose rows contain the covariate values corresponding to the parameters of each cubic regression spline - the cubic regression splines are parameterized using their $y$- values at a series of $x$ values - these vectors contain those $x$ values!

References

Wood, S.N. (2000) "Modelling and smoothing parameter estimation with multiple quadratic penalties" JRSSB 62(2):413-428

Examples

Run this code

# This example modified from routine SANtest()

    n<-100 # number of observations to simulate
    x <- runif(5 * n, 0, 1) # simulate covariates
    x <- array(x, dim = c(5, n)) # put into array for passing to GAMsetup
    pi <- asin(1) * 2  # begin simulating some data
    y <- 2 * sin(pi * x[2, ])
    y <- y + exp(2 * x[3, ]) - 3.75887
    y <- y + 0.2 * x[4, ]^11 * (10 * (1 - x[4, ]))^6 + 10 * (10 * 
        x[4, ])^3 * (1 - x[4, ])^10 - 1.396
    sig2<- -1    # set magnitude of variance 
    e <- rnorm(n, 0, sqrt(abs(sig2)))
    y <- y + e          # simulated data
    w <- matrix(1, n, 1) # weight matrix
    par(mfrow = c(2, 2)) # scatter plots of simulated data
    plot(x[2, ], y)
    plot(x[3, ], y)
    plot(x[4, ], y)
    plot(x[5, ], y)
    x[1,]<-1
    G <- list(m = 4, n = n, nsdf = 0, df = c(15, 15, 15, 15),dim=c(1,1,1,1),s.type=c(0,0,0,0), 
        p.order=c(0,0,0,0),x = x) # creat list for passing to GAMsetup
    H <- GAMsetup(G)
    H$y <- y    # add data to H
    H$sig2 <- sig2  # add variance (signalling GCV use in this case) to H
    H$w <- w       # add weights to H
    H$sp<-array(-1,H$m)
    H$fix<-array(FALSE,H$m)
    H$conv.tol<-1e-6;H$max.half<-15
    H <- mgcv(H)  # select smoothing parameters and fit model