Generalized Additive Models using penalized regression splines and GCV
Fits the specified generalized additive model to data. The GAM is represented using one dimensional penalized regression splines with smoothing parameters selected by GCV.
- A GAM formula. This is exactly like the formula for a glm exept that smooth terms can be added to the right hand side of the formula, and the l.h.s. must contain only the names of a variable, and not some transformation function applied to a named variabl
- A data frame containing the model covariates required by the formula. If this is missing the searth list is used to try and find the variables needed.
- prior weights on the data.
- This is a family object specifying the distribution and link to use on
fitting etc. See
familyfor more details.
- If this is zero then GCV is used for all distributions except Poisson and binomial where UBRE is used with scale parameter assumed to be 1. If this is greater than 1 it is assumed to be the scale parameter/variance and UBRE is used, otherwise GCV is used.
Each smooth model terms is represented using a cubic penalized
regression spline. Knots of the spline are placed evenly
throughout the covariate values to which the term refers: For
example, if fitting 101 data with a 10 knot spline of
there would be a knot at every 10th (ordered)
x value. The
of penalized regression splines turns the gam fitting problem
into a penalized glm fitting problem, which can be fitted using a
slight modification of
gam.fit. The penalized
approach also allows smoothing parameters for all smooth terms to
be selected simultaneously by GCV or UBRE. This is achieved as
part of fitting by calling
given in Wood (2000).
- The function returns an object of class
coefficients the coefficients of the fitted model. parametric coefficients are first followed by coefficients for each spline term in turn. residuals the deviance residuals for the fitted model. fitted.values fitted model predictions of expected value for each datum. family family object specifying distribution and link used. linear.predictor fitted model prediction of link function of expected value for each datum. deviance (unpenalized) null.deviance: df.null iter number of iterations of IRLS taken to get convergence. weights final weights used in IRLS iteration. prior.weights y response data. converged: sig2 estimated or supplied variance/scale parameter. edf estimated degrees of freedom for each smooth. boundary sp smoothing parameter for each smooth. df number of knots for each smooth (one more than maximum degrees of freedom). nsdf number of parametric, non-smooth, model terms excluding the intercept. Vp estimated covariance matrix for parameters. xp knot locations for each smooth. xp[i,] are the locations for the ith smooth. formula the model formula. x parametric design matrix columns (excluding intercept) followed by the data that form arguments of the smooth.
The code does not check for rank defficiency of the model matrix - it will likely just fail instead!
Gu and Wahba (1991) Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM J. Sci. Statist. Comput. 12:383-398
Wood (2000) Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties. JRSSB 62(2)
library(mgcv) n<-200 sig2<-4 x0 <- runif(n, 0, 1) x1 <- runif(n, 0, 1) x2 <- runif(n, 0, 1) x3 <- runif(n, 0, 1) pi <- asin(1) * 2 y <- 2 * sin(pi * x0) y <- y + exp(2 * x1) - 3.75887 y <- y + 0.2 * x2^11 * (10 * (1 - x2))^6 + 10 * (10 * x2)^3 * (1 - x2)^10 - 1.396 e <- rnorm(n, 0, sqrt(abs(sig2))) y <- y + e b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3)) plot(b)