refund (version 0.1-23)

# smooth.construct.pco.smooth.spec: Principal coordinate ridge regression

## Description

Smooth constructor function for principal coordinate ridge regression fitted by `gam`. When the principal coordinates are defined by a relevant distance among functional predictors, this is a form of nonparametric scalar-on-function regression. Reiss et al. (2016) describe the approach and apply it to dynamic time warping distances among functional predictors.

## Usage

```# S3 method for pco.smooth.spec
smooth.construct(object, data, knots)```

## Arguments

object

a smooth specification object, usually generated by a term of the form `s(dummy, bs="pco", k, xt)`; see Details.

data

a list containing just the data.

knots

IGNORED!

## Value

An object of class `pco.smooth`. The resulting object has an `xt` element which contains details of the multidimensional scaling, which may be interesting.

## Details

The constructor is not normally called directly, but is rather used internally by `gam`.

In a `gam` term of the above form ```s(dummy, bs="pco", k, xt)```,

• `dummy` is an arbitrary vector (or name of a column in `data`) whose length is the number of observations. This is not actually used, but is required as part of the input to `s`. Note that if multiple `pco` terms are used in the model, there must be multiple unique term names (e.g., "`dummy1`", "`dummy2`", etc).

• `k` is the number of principal coordinates (e.g., `k=9` will give a 9-dimensional projection of the data).

• `xt` is a list supplying the distance information, in one of two ways. (i) A matrix `Dmat` of distances can be supplied directly via `xt=list(D=Dmat,…)`. (ii) Alternatively, one can use `xt=list(realdata=…, dist_fn=…, …)` to specify a data matrix `realdata` and distance function `dist_fn`, whereupon a distance matrix `dist_fn(realdata)` is created.

The list `xt` also has the following optional elements:

• `add`: Passed to `cmdscale` when performing multidimensional scaling; for details, see the help for that function. (Default `FALSE`.)

• `fastcmd`: if `TRUE`, multidimensional scaling is performed by `cmdscale_lanczos`, which uses Lanczos iteration to eigendecompose the distance matrix; if `FALSE`, MDS is carried out by `cmdscale`. Default is `FALSE`, to use `cmdscale`.

## References

Reiss, P. T., Miller, D. L., Wu, P.-S., and Wen-Yu Hua, W.-Y. Penalized nonparametric scalar-on-function regression via principal coordinates. Under revision. Available at https://works.bepress.com/phil_reiss/42/.

## Examples

```# NOT RUN {
# a simulated example
library(refund)
library(mgcv)
require(dtw)

## First generate the data
Xnl <- matrix(0, 30, 101)
set.seed(813)
tt <- sort(sample(1:90, 30))
for(i in 1:30){
Xnl[i, tt[i]:(tt[i]+4)] <- -1
Xnl[i, (tt[i]+5):(tt[i]+9)] <- 1
}
X.toy <- Xnl + matrix(rnorm(30*101, ,0.05), 30)
y.toy <- tt + rnorm(30, 0.05)
y.rainbow <- rainbow(30, end=0.9)[(y.toy-min(y.toy))/
diff(range(y.toy))*29+1]

## Display the toy data
par(mfrow=c(2, 2))
matplot((0:100)/100, t(Xnl[c(4, 25), ]), type="l", xlab="t", ylab="",
ylim=range(X.toy), main="Noiseless functions")
matplot((0:100)/100, t(X.toy[c(4, 25), ]), type="l", xlab="t", ylab="",
ylim=range(X.toy), main="Observed functions")
matplot((0:100)/100, t(X.toy), type="l", lty=1, col=y.rainbow, xlab="t",
ylab="", main="Rainbow plot")

## Obtain DTW distances
D.dtw <- dist(X.toy, method="dtw", window.type="sakoechiba", window.size=5)

## Compare PC vs. PCo ridge regression

# matrix to store results
GCVmat <- matrix(NA, 15, 2)
# dummy response variable
dummy <- rep(1,30)

# loop over possible projection dimensions
for (k. in 1:15){
# fit PC (m1) and PCo (m2) ridge regression
m1 <- gam(y.toy ~ s(dummy, bs="pco", k=k.,
xt=list(realdata=X.toy, dist_fn=dist)), method="REML")
m2 <- gam(y.toy ~ s(dummy, bs="pco", k=k., xt=list(D=D.dtw)), method="REML")
# calculate and store GCV scores
GCVmat[k., ] <- length(y.toy) * c(sum(m1\$residuals^2)/m1\$df.residual^2,
sum(m2\$residuals^2)/m2\$df.residual^2)
}

## plot the GCV scores per dimension for each model
matplot(GCVmat, lty=1:2, col=1, pch=16:17, type="o", ylab="GCV",
xlab="Number of principal components / coordinates",
main="GCV score")
legend("right", c("PC ridge regression", "DTW-based PCoRR"), lty=1:2, pch=16:17)

## example of making a prediction

# fit a model to the toy data
m <- gam(y.toy ~ s(dummy, bs="pco", k=2, xt=list(D=D.dtw)), method="REML")

# first build the distance matrix
# in this case we just subsample the original matrix