sreg
(x, y, lam = NA, df = NA, offset = 0, wt = rep(1, length(x)), cost = 1,
nstep.cv = 80, find.diagA = TRUE, trmin = 2.01,
trmax = length(unique(x)) * 0.95, lammin = NA, lammax = NA, verbose = FALSE,
do.cv = TRUE, method = "GCV", rmse = NA, lambda = NA)
ESTIMATE: A smoothing spline is a locally weighted average of the y's based on the relative locations of the x values. Formally the estimate is the curve that minimizes the criterion:
(1/n) sum(k=1,n) w.k( Y.k - f( X.k))**2 + lambda R(f)
where R(f) is the integral of the squared second derivative of f over the range of the X values. The solution is a piecewise cubic polynomial with the join points at the unique set of X values. The polynomial segments are constructed so that the entire curve has continuous first and second derivatives and the second and third derivatives are zero at the boundaries. The smoothing has the range [0,infinity]. Lambda equal to zero gives a cubic spline interpolation of the data. As lambda diverges to infinity ( e.g lambda =1e20) the estimate will converge to the straight line estimated by least squares.
The values of the estimated function at the data points can be expressed in the matrix form:
predicted.values= A(lambda)Y
where A is an nXn symmetric matrix that does NOT depend on Y. The diagonal elements are the leverage values for the estimate and the sum of these (trace(A(lambda)) can be interpreted as the effective number of parameters that are used to define the spline function. IF there are replicate points the A matrix is the result of finding group averages and applying a weighted spline to the means.
CROSS-VALIDATION:The GCV criterion with no replicate points for a fixed value of lambda is
(1/n)(Residual sum of squares)/((1-(tr(A)-offset)*cost + offset)/n)**2,
Usually offset =0 and cost =1. Variations on GCV with replicate points are described in the documentation help file for Krig.
COMPUTATIONS: The computations for 1-d splines exploit the banded structure of the matrices needed to solve for the spline coefficients. Banded structure also makes it possible to get the diagonal elements of A quickly. This approach is different from the algorithms in Tps and tremendously more efficient for larger numbers of unique x values ( say > 200). The advantage of Tps is getting "Bayesian" standard errors at predictions different from the observed x values. This function is similar to the S-Plus smooth.spline. The main advantages are more information and control over the choice of lambda and also the FORTRAN source code is available (css.f).
# fit a GCV spline to
# control group of rats.
fit<- sreg(rat.diet$t,rat.diet$con)
summary( fit)
plot(fit) # diagnostic plots of fit
predict( fit) # predicted values at data points
xg<- seq(0,110,,50)
sm<-predict( fit, xg) # spline fit at 50 equally spaced points
der.sm<- predict( fit, xg, deriv=1) # derivative of spline fit
set.panel( 2,1)
plot( fit$x, fit$y) # the data
lines( xg, sm) # the spline
plot( xg,der.sm, type="l") # plot of estimated derivative
# the same fit using the thin plate spline numerical algorithms
# (sreg is more efficient for 1-d problems)
fit.tps<-Tps( rat.diet$t,rat.diet$con)
summary( fit.tps)
# replicated data
# this is a simulated case. find lambda by matching rmse to be .2
# and use this estimate of lambda
sreg( test.data2$x, test.data2$y, rmse=.2, method="RMSE")-> fit
set.panel( 1,1)
Run the code above in your browser using DataLab