predict.bigssa: Predicts for "bigssa" Objects

Description

Get fitted values and standard error estimates for smoothing spline anova models.

Usage

# S3 method for bigssa
predict(object,newdata=NULL,se.fit=FALSE,include=object$tnames,
        effect=c("all","0","lin","non"),includeint=FALSE,
        design=FALSE,smoothMatrix=FALSE,intercept=NULL,...)

Arguments

object

Object of class "bigssa", which is output from bigssa.

newdata

Data frame or list containing the new data points for prediction. Variable names must match those used in the formula input of bigssa. See Details and Example. Default of newdata=NULL uses original data in object input.

se.fit

Logical indicating whether the standard errors of the fitted values should be estimated. Default is se.fit=FALSE.

include

Which terms to include in the estimate. You can get fitted values for any combination of terms in the tnames element of an "bigssa" object.

effect

Which effect to estimate: effect="all" gives $\hat{y}$ for given terms in include, effect="lin" gives linear portion of $\hat{y}$ for given terms in include, and effect="non" gives nonlinear portion of $\hat{y}$ for given terms in include. Use effect="0" to return the intercept.

includeint

Logical indicating whether the intercept should be included in the prediction. If include=object$tnames and effect="all" (default), then this input is ignored and the intercept is automatically included in the prediction.

design

Logical indicating whether the design matrix should be returned.

smoothMatrix

Logical indicating whether the smoothing matrix should be returned.

intercept

Logical indicating whether the intercept should be included in the prediction. When used, this input overrides the includeint input.

…

Ignored.

Value

If se.fit=FALSE, design=FALSE, and smoothMatrix=FALSE, returns vector of fitted values.

Otherwise returns list with elements:

fit

Vector of fitted values

se.fit

Vector of standard errors of fitted values (if se.fit=TRUE)

Design matrix used to create fitted values (if design=TRUE)

Index vector such that fit=X%*%object$modelspec$coef[ix] (if design=TRUE)

Smoothing matrix corresponding to fitted values (if smoothMatrix=TRUE)

Details

Uses the coefficient and smoothing parameter estimates from a fit smoothing spline anova (estimated by bigssa) to predict for new data.

References

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer.

Helwig, N. E. (2013). Fast and stable smoothing spline analysis of variance models for large samples with applications to electroencephalography data analysis. Unpublished doctoral dissertation. University of Illinois at Urbana-Champaign.

Helwig, N. E. (2016). Efficient estimation of variance components in nonparametric mixed-effects models with large samples. Statistics and Computing, 26, 1319-1336.

Helwig, N. E. (2017). Regression with ordered predictors via ordinal smoothing splines. Frontiers in Applied Mathematics and Statistics, 3(15), 1-13.

Helwig, N. E. and Ma, P. (2015). Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. Journal of Computational and Graphical Statistics, 24, 715-732.

Helwig, N. E. and Ma, P. (2016). Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters. Statistics and Its Interface, 9, 433-444.

Examples

Run this code

# NOT RUN {
##########   EXAMPLE 1   ##########

# define univariate function and data
set.seed(773)
myfun <- function(x){ 2 + x + sin(2*pi*x) }
x <- runif(500)
y <- myfun(x) + rnorm(500)

# fit cubic spline model
cubmod <- bigssa(y~x,type="cub",nknots=30)
crossprod( predict(cubmod) - myfun(x) )/500

# define new data for prediction
newdata <- data.frame(x=seq(0,1,length.out=100))

# get fitted values and standard errors for new data
yc <- predict(cubmod,newdata,se.fit=TRUE)

# plot results with 95% Bayesian confidence interval
plot(newdata$x,yc$fit,type="l")
lines(newdata$x,yc$fit+qnorm(.975)*yc$se.fit,lty=3)
lines(newdata$x,yc$fit-qnorm(.975)*yc$se.fit,lty=3)

# predict constant, linear, and nonlinear effects
yc0 <- predict(cubmod,newdata,se.fit=TRUE,effect="0")
ycl <- predict(cubmod,newdata,se.fit=TRUE,effect="lin")
ycn <- predict(cubmod,newdata,se.fit=TRUE,effect="non")
crossprod( yc$fit - (yc0$fit + ycl$fit + ycn$fit) )

# plot results with 95% Bayesian confidence intervals
par(mfrow=c(1,2))
plot(newdata$x,ycl$fit,type="l",main="Linear effect")
lines(newdata$x,ycl$fit+qnorm(.975)*ycl$se.fit,lty=3)
lines(newdata$x,ycl$fit-qnorm(.975)*ycl$se.fit,lty=3)
plot(newdata$x,ycn$fit,type="l",main="Nonlinear effect")
lines(newdata$x,ycn$fit+qnorm(.975)*ycn$se.fit,lty=3)
lines(newdata$x,ycn$fit-qnorm(.975)*ycn$se.fit,lty=3)
         
         
##########   EXAMPLE 2   ##########

# define bivariate function and data
set.seed(773)
myfun<-function(x){
  2 + x[,1]/10 - x[,2]/5 + 2*sin(sqrt(x[,1]^2+x[,2]^2+.1))/sqrt(x[,1]^2+x[,2]^2+.1)
}
x1v <- runif(500)*16-8
x2v <- runif(500)*16-8
y <- myfun(cbind(x1v,x2v)) + rnorm(500)

# tensor product cubic splines with 50 knots
cubmod <- bigssa(y~x1v*x2v,type=list(x1v="cub",x2v="cub"),nknots=50)
crossprod( predict(cubmod) - myfun(cbind(x1v,x2v)) )/500

# define new data for prediction
xnew <- as.matrix(expand.grid(seq(-8,8,l=50),seq(-8,8,l=50)))
newdata <- list(x1v=xnew[,1],x2v=xnew[,2])

# get fitted values for new data
yp <- predict(cubmod,newdata)

# plot results
imagebar(seq(-8,8,l=50),seq(-8,8,l=50),matrix(yp,50,50),
         xlab=expression(italic(x)[1]),ylab=expression(italic(x)[2]),
         zlab=expression(hat(italic(y))))

# predict linear and nonlinear effects for x1v
newdata <- list(x1v=seq(-8,8,length.out=100))
yl <- predict(cubmod,newdata,include="x1v",effect="lin",se.fit=TRUE)
yn <- predict(cubmod,newdata,include="x1v",effect="non",se.fit=TRUE)

# plot results with 95% Bayesian confidence intervals
par(mfrow=c(1,2))
plot(newdata$x1v,yl$fit,type="l",main="Linear effect")
lines(newdata$x1v,yl$fit+qnorm(.975)*yl$se.fit,lty=3)
lines(newdata$x1v,yl$fit-qnorm(.975)*yl$se.fit,lty=3)
plot(newdata$x1v,yn$fit,type="l",main="Nonlinear effect",ylim=c(-.3,.4))
lines(newdata$x1v,yn$fit+qnorm(.975)*yn$se.fit,lty=3)
lines(newdata$x1v,yn$fit-qnorm(.975)*yn$se.fit,lty=3)

# }

Run the code above in your browser using DataLab