ps: Defining Penalized Spline Smooths in VGAM Formulas

Description

ps is used in the definition of P-spline smooth terms within vgam formulas.

Usage

ps(x, ..., ps.intervals = NULL, lambda = 0, degree = 2, order = 2, ridge.adj = 1e-5, ridge.inv = 0.0001, spillover = 0.01, maxlambda = 1e4)

Arguments

covariate (abscissae) to be smoothed. Also called the regressor. If the xij facility is used then more covariates are needed in the ... argument.

...

Used to accommodate the other $M-1$ covariates when the xij facility is used. See Section 3.4.4 of Yee (2015) for something very similar. This argument, found in the second argument, means that the other argument names must be fully specified if used, e.g., ps.intervals and not ps.int. See the example below. In the example below, the term in the main formula is ps(gcost.air, gcost.trn, gcost.bus) and one might be tempted to use something like ps(gcost) to represent that xij term. However, this is not recommended because ps(gcost) might not have the same number of columns as ps(gcost.air, gcost.trn, gcost.bus) etc. That is, it is best to select one of the diagonal elements of the block matrix to represent that term.

ps.intervals

the number of equally-spaced B-spline intervals. Note that the number of knots is equal to ps.intervals + 2*degree + 1. The default, signified by NULL, means that ceiling(1.5 * log(length(unique(x.index)))) is used, where x.index is the combined data. This is not guaranteed to work on every data set, and it might change in the future.

lambda, maxlambda

maxlambda are the non-negative regularization parameters for difference penalty, whose values should be less than maxlambda. Can be a vector.

degree

degree of B-spline basis. Usually this will be 2 or 3; and the values 1 or 4 might possibly be used.

order

order of difference penalty (0 is the ridge penalty).

ridge.adj, ridge.inv

small positive numbers to stabilize linear dependencies among B-spline bases.

spillover

small positive proportion of the range used on the outside of the boundary values.

Value

A matrix with attributes that are (only) used by vgam. The number of rows of the matrix is length(x) and the number of columns is ps.intervals + degree - 1.

Details

This function is currently used by vgam to allow automatic smoothing parameter selection based on P-splines and minimizing an UBRE quantity. It is recommended above s also because backfitting is not required.

Unlike s, which is symbolic and does not perform any smoothing itself, this function does compute the penalized spline when used by vgam. When this function is used within vgam, automatic smoothing parameter selection is implemented by calling magic after the necessary link-ups are done.

This function is smart; it can be used for smart prediction (Section 18.6 of Yee (2015)).

References

Eilers, P. H. C. and Marx, B. D. (2002). Generalized Linear Additive Smooth Structures. Journal of Computational and Graphical Statistics, 11(4): 758--783.

Marx, B. D. and Eilers, P. H. C. (1998). Direct generalized linear modeling with penalized likelihood. CSDA, 28(2): 193--209.

Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties (with comments and rejoinder). Statistical Science, 11(2): 89--121.

Wood, S. N. (2004). Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Assoc., 99(467): 673--686.

Examples

Run this code

ps(runif(10))
ps(runif(10), ps.intervals = 5)

## Not run: 
# data("TravelMode", package = "AER")  # Need to install "AER" first
# air.df <- subset(TravelMode, mode == "air")  # Form 4 smaller data frames
# bus.df <- subset(TravelMode, mode == "bus")
# trn.df <- subset(TravelMode, mode == "train")
# car.df <- subset(TravelMode, mode == "car")
# TravelMode2 <- data.frame(income     = air.df$income,
#                           wait.air   = air.df$wait  - car.df$wait,
#                           wait.trn   = trn.df$wait  - car.df$wait,
#                           wait.bus   = bus.df$wait  - car.df$wait,
#                           gcost.air  = air.df$gcost - car.df$gcost,
#                           gcost.trn  = trn.df$gcost - car.df$gcost,
#                           gcost.bus  = bus.df$gcost - car.df$gcost,
#                           wait       = air.df$wait)  # Value is unimportant
# TravelMode2$mode <- subset(TravelMode, choice == "yes")$mode  # The response
# TravelMode2 <- transform(TravelMode2, incom.air = income, incom.trn = 0,
#                                       incom.bus = 0)
# set.seed(1)
# TravelMode2 <- transform(TravelMode2,
#                          junkx2 = runif(nrow(TravelMode2)))
# 
# tfit2 <-
#   vgam(mode ~ ps(gcost.air, gcost.trn, gcost.bus) + ns(junkx2, 4) +
#               ps(incom.air, incom.trn, incom.bus) + wait ,
#        crit = "coef",
#        multinomial(parallel = FALSE ~ 1), data = TravelMode2,
#        xij = list(ps(gcost.air, gcost.trn, gcost.bus) ~
#                   ps(gcost.air, gcost.trn, gcost.bus) +
#                   ps(gcost.trn, gcost.bus, gcost.air) +
#                   ps(gcost.bus, gcost.air, gcost.trn),
#                   ps(incom.air, incom.trn, incom.bus) ~
#                   ps(incom.air, incom.trn, incom.bus) +
#                   ps(incom.trn, incom.bus, incom.air) +
#                   ps(incom.bus, incom.air, incom.trn),
#                   wait   ~  wait.air +  wait.trn +  wait.bus),
#        form2 = ~  ps(gcost.air, gcost.trn, gcost.bus) +
#                   ps(gcost.trn, gcost.bus, gcost.air) +
#                   ps(gcost.bus, gcost.air, gcost.trn) +
#                   wait +
#                   ps(incom.air, incom.trn, incom.bus) +
#                   ps(incom.trn, incom.bus, incom.air) +
#                   ps(incom.bus, incom.air, incom.trn) +
#                   junkx2 + ns(junkx2, 4) +
#                   incom.air + incom.trn + incom.bus +
#                   gcost.air + gcost.trn + gcost.bus +
#                   wait.air +  wait.trn +  wait.bus)
# par(mfrow = c(2, 2))
# plot(tfit2, se = TRUE, lcol = "orange", scol = "blue", ylim = c(-4, 4))
# ## End(Not run)

Run the code above in your browser using DataLab