penalized.pls: Predict New Data Using a Penalized PLS Model

Description

Given a fitted penalized PLS model and new test data, this function predicts the response for all components. If true response values are provided, it also returns the mean squared error (MSE) for each component.

Computes the regression coefficients for a Penalized Partial Least Squares (PPLS) model, using either a classical NIPALS algorithm or a kernel-based version. Optionally allows block-wise variable selection.

Performs k-fold cross-validation to evaluate and select the optimal penalization parameter lambda and the number of components ncomp in a PPLS model.

Computes the regression coefficients using the standard (NIPALS-based) version of Penalized PLS. This function is typically called internally by penalized.pls.

Computes the regression coefficients using the kernel-based version of Penalized PLS, especially useful when the number of predictors exceeds the number of observations (p >> n).

Computes the regression coefficients of a Penalized Partial Least Squares (PPLS) model using block-wise selection, where each component is restricted to use variables from only one block.

Usage

new.penalized.pls(ppls, Xtest, ytest = NULL)
penalized.pls(
  X,
  y,
  P = NULL,
  ncomp = NULL,
  kernel = FALSE,
  scale = FALSE,
  blocks = 1:ncol(X),
  select = FALSE
)
penalized.pls.cv(
  X,
  y,
  P = NULL,
  lambda = 1,
  ncomp = NULL,
  k = 5,
  kernel = FALSE,
  scale = FALSE
)
penalized.pls.default(X, y, M = NULL, ncomp)
penalized.pls.kernel(X, y, M = NULL, ncomp)
penalized.pls.select(X, y, M = NULL, ncomp, blocks)

Value

A list containing:

ypred: A numeric matrix of predicted responses. Each column corresponds to a different number of PLS components.
mse: A numeric vector of mean squared errors, if ytest is provided. Otherwise NULL.

A list with components:

intercept: A numeric vector of intercepts for 1 to ncomp components.
coefficients: A numeric matrix of size ncol(X) x ncomp, each column being the coefficient vector for the corresponding number of components.

An object of class "mypls", a list with the following components:

error.cv: A matrix of mean squared errors. Rows correspond to different lambda values; columns to different numbers of components.
lambda: The vector of candidate lambda values.
lambda.opt: The lambda value giving the minimum cross-validated error.
index.lambda: The index of lambda.opt in lambda.
ncomp.opt: The optimal number of PLS components.
min.ppls: The minimum cross-validated error.
intercept: Intercept of the optimal model (fitted on the full dataset).
coefficients: Coefficient vector for the optimal model.
coefficients.jackknife: An array of shape ncol(X) x ncomp x length(lambda) x k, containing the coefficients from each CV split and parameter setting.

A list with:

coefficients: A matrix of size ncol(X) x ncomp, each column containing the regression coefficients for the first $i$ components.

A list with:

coefficients: A matrix of size ncol(X) x ncomp, containing the estimated regression coefficients for each number of components.

A list with:

coefficients: A matrix of size ncol(X) x ncomp, containing the regression coefficients after block-wise selection.

Arguments

ppls: A fitted penalized PLS model, as returned by penalized.pls.
Xtest: A numeric matrix of new input data for prediction.
ytest: Optional. A numeric response vector corresponding to Xtest, for evaluating prediction error.
X: A numeric matrix of centered (and optionally scaled) predictor variables.
y: A centered numeric response vector.
P: Optional penalty matrix. If NULL, ordinary PLS is computed (i.e., no penalization).
ncomp: Integer. Number of PLS components to compute.
kernel: Logical. If TRUE, uses the kernel representation of PPLS. Default is FALSE.
scale: Logical. If TRUE, scales predictors in X to unit variance. Default is FALSE.
blocks: An integer vector of length ncol(X) that defines the block structure of the variables. All variables sharing the same value in blocks belong to the same block.
select: Logical. If TRUE, block-wise variable selection is applied in each iteration. Only one block contributes to the latent direction per component. Default is FALSE.
lambda: A numeric vector of candidate penalty parameters. Default is 1.
k: Integer. Number of cross-validation folds. Default is 5.
M: Optional penalty transformation matrix $M = (I + P)^{-1}$. If NULL, no penalization is applied.

Details

The fitted model ppls contains intercepts and regression coefficients for each number of components (from 1 to ncomp). The function computes:

the matrix of predicted values for each component (as columns),
and, if ytest is provided, a vector of mean squared errors for each component.

The prediction is performed as: $$\hat{y}^{(i)} = X_\text{test} \cdot \beta^{(i)} + \text{intercept}^{(i)},$$ for each number of components $i = 1, \ldots, ncomp$.

This function centers X and y, and optionally scales X, then computes PPLS components using one of:

the classical NIPALS algorithm (kernel = FALSE), or
the kernel representation (kernel = TRUE), often faster when p > n (high-dimensional case).

When a penalty matrix P is supplied, a transformation $M = (I + P)^{-1}$ is computed internally. The algorithm then maximizes the penalized covariance between Xw and y: $$\text{argmax}_w \; \text{Cov}(Xw, y)^2 - \lambda \cdot w^\top P w$$

The block-wise selection strategy (when select = TRUE) restricts the weight vector w at each iteration to be non-zero in a single block, selected greedily.

The function splits the data into k cross-validation folds, and for each value of lambda and number of components up to ncomp, computes the mean squared prediction error.

The optimal parameters are selected as those minimizing the prediction error across all folds. Internally, for each fold and lambda value, the function calls penalized.pls to fit the model and new.penalized.pls to evaluate predictions.

The returned object can be further used for statistical inference (e.g., via jackknife) or prediction.

The method is based on iteratively computing latent directions that maximize the covariance with the response y. At each step:

A weight vector $w$ is computed as $w = M X^\top y$ (if penalization is used).
The latent component $t = X w$ is extracted and normalized.
The matrix X is deflated orthogonally with respect to t.

The final regression coefficients are computed via a triangular system using the bidiagonal matrix $R = T^\top X W$, and backsolving: $$\beta = W L (T^\top y),$$ where $L = R^{-1}$.

The kernel PPLS algorithm is based on representing the model in terms of the Gram matrix $K = X M X^\top$ (or simply $K = X X^\top$ if M = NULL). The algorithm iteratively computes orthogonal latent components $t_i$ in sample space.

Steps:

Initialize residual vector $u = y$, then normalize $t = Ku$.
Orthogonalize $t$ with respect to previous components (if needed).
Repeat for ncomp components.

The regression coefficients are recovered as: $$\beta = X^\top A, \quad \text{where } A = UU L (T^\top y),$$ with $UU$ and $TT$ the matrices of latent vectors and components, and $L = R^{-1}$ the back-solved triangular system.

This function implements a sparse selection strategy inspired by sparse or group PLS. At each component iteration, it computes the penalized covariance between X and y, and selects the block k for which the mean squared weight of its variables is maximal: $$\text{score}_k = \frac{1}{|B_k|} \sum_{j \in B_k} w_j^2$$

Only the weights corresponding to the selected block are retained, and all others are set to zero. The rest of the algorithm follows the classical NIPALS-like PLS with orthogonal deflation.

This procedure enhances interpretability by selecting only one block per component, making it suitable for structured variable selection (e.g., grouped predictors).

References

N. Kraemer, A.-L. Boulesteix, and G. Tutz (2008). Penalized Partial Least Squares with Applications to B-Spline Transformations and Functional Data. Chemometrics and Intelligent Laboratory Systems, 94(1), 60–69. tools:::Rd_expr_doi("10.1016/j.chemolab.2008.06.009")

Examples

Run this code

set.seed(123)
X <- matrix(rnorm(50 * 200), ncol = 50)
y <- rnorm(200)

Xtrain <- X[1:100, ]
ytrain <- y[1:100]
Xtest <- X[101:200, ]
ytest <- y[101:200]

pen.pls <- penalized.pls(Xtrain, ytrain, ncomp = 10)
pred <- new.penalized.pls(pen.pls, Xtest, ytest)
head(pred$ypred)
pred$mse

## Example from Kraemer et al. (2008)
data(BOD)
X <- BOD[, 1]
y <- BOD[, 2]

Xtest <- seq(min(X), max(X), length = 200)
dummy <- X2s(X, Xtest, deg = 3, nknot = 20)  # Spline transformation
Z <- dummy$Z
Ztest <- dummy$Ztest
size <- dummy$sizeZ
P <- Penalty.matrix(size, order = 2)
lambda <- 200
number.comp <- 3

ppls <- penalized.pls(Z, y, P = lambda * P, ncomp = number.comp)
new.ppls <- new.penalized.pls(ppls, Ztest)$ypred

# Plot fitted values for 2 components
plot(X, y, lwd = 3, xlim = range(Xtest))
lines(Xtest, new.ppls[, 2], col = "blue")

set.seed(42)
X <- matrix(rnorm(20 * 100), ncol = 20)
y <- rnorm(100)

# Example with no penalty
result <- penalized.pls.cv(X, y, lambda = c(0, 1, 10), ncomp = 5)
result$lambda.opt
result$ncomp.opt
result$min.ppls

# Using jackknife estimation after CV
jack <- jack.ppls(result)
coef(jack)

set.seed(123)
X <- matrix(rnorm(20 * 50), nrow = 50)
y <- rnorm(50)
M <- diag(ncol(X))  # No penalty
coef <- penalized.pls.default(scale(X, TRUE, FALSE), scale(y, TRUE, FALSE),
  M, ncomp = 3)$coefficients
coef[, 1]  # coefficients for 1st component

set.seed(123)
X <- matrix(rnorm(100 * 10), nrow = 100)
y <- rnorm(100)
K <- X %*% t(X)
coef <- penalized.pls.kernel(X, y, M = NULL, ncomp = 2)$coefficients
head(coef[, 1])  # coefficients for 1st component

set.seed(321)
X <- matrix(rnorm(40 * 30), ncol = 40)
y <- rnorm(30)

# Define 4 blocks of 10 variables each
blocks <- rep(1:4, each = 10)
result <- penalized.pls.select(X, y, M = NULL, ncomp = 2, blocks = blocks)
result$coefficients[, 1]  # Coefficients for first component

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

Details

References

See Also

Examples