predict: Predict Method for PLS, sPLS, PLS-DA or sPLS-DA

Description

Predicted values based on PLS, sparse PLS, PLS-DA or sparse PLS-DA models. New responses and variates are predicted using a fitted model and a new matrix of observations.

Usage

## S3 method for class 'pls':
predict(object, newdata, ...)
## S3 method for class 'spls':
predict(object, newdata, ...)
## S3 method for class 'plsda':
predict(object, newdata, method = c("all", "max.dist", 
        "centroids.dist", "mahalanobis.dist"), ...)
## S3 method for class 'splsda':
predict(object, newdata, method = c("all", "max.dist", 
        "centroids.dist", "mahalanobis.dist"), ...)

Arguments

object

object of class inheriting from "pls", "spls", "plsda" or "splsda".

newdata

data matrix in which to look for for explanatory variables to be used for prediction.

method

method to be applied for plsda or splsda to predict the class of new data, should be a subset of "centroids.dist", "mahalanobis.dist" or "max.dist" (see Details). Defaults to "a

...

not used currently.

Value

predict produces a list with the following components:
predicta three dimensional array of predicted response values. The dimensions correspond to the observations, the response variables and the model dimension, respectively.
variatesmatrix of predicted variates.
B.hatmatrix of regression coefficients (without the intercept).
classvector or matrix of predicted class by using $1,...,$ncomp (sparse)PLS-DA components.
centroidsmatrix of coordinates for centroids.

encoding

latin1

Details

predict produces predicted values, obtained by evaluating the PLS, sparse PLS, PLSDA or sparse PLSDA model returned by pls, spls, plsda or splsda in the frame newdata. Variates for newdata are also returned. The prediction values are calculated based on the regression coefficients of object$Y onto object$variates$X.

Different class prediction methods are proposed for plsda or splsda: "max.dist" is the naive method to predict the class. It is based on the predicted matrix (object$predict) which can be seen as a probability matrix to assign each test data to a class. The class with the largest class value is the predicted class. "centroids.dist" allocates the individual $x$ to the class of $Y$ minimizing $dist(\code{x-variate}, G_l)$, where $G_l$, $l = 1,...,L$ are the centroids of the classes calculated on the $X$-variates of the model. "mahalanobis.dist" allocates the individual $x$ to the class of $Y$ as in "centroids.dist" but by using the Mahalanobis metric in the calculation of the distance.

References

Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris: Editions Technic.

Examples

Run this code

data(linnerud)
X <- linnerud$exercise
Y <- linnerud$physiological
linn.pls <- pls(X, Y, ncomp = 2, mode = "classic")

indiv1 <- c(200, 40, 60)
indiv2 <- c(190, 45, 45)
newdata <- rbind(indiv1, indiv2)
colnames(newdata) <- colnames(X)
newdata

pred <- predict(linn.pls, newdata)

plotIndiv(linn.pls, comp = 1:2, rep.space = "X-variate")
points(pred$variates[, 1], pred$variates[, 2], pch = 19, cex = 1.2)
text(pred$variates[, 1], pred$variates[, 2], 
     c("new ind.1", "new ind.2"), pos = 3)
	 
## First example with plsda
data(liver.toxicity)
X <- liver.toxicity$gene
Y <- as.factor(liver.toxicity$treatment[, 4])

## if training is perfomed on 4/5th of the original data
samp <- sample(1:5, nrow(X), replace = TRUE)  
test <- which(samp == 1)   # testing on the first fold
train <- setdiff(1:nrow(X), test)

plsda.train <- plsda(X[train, ], Y[train], ncomp = 2)
test.predict <- predict(plsda.train, X[test, ], method = "max.dist")
Prediction <- levels(Y)[test.predict$class$max.dist[, 2]]
cbind(Y = as.character(Y[test]), Prediction)

## Second example with splsda
splsda.train <- splsda(X[train, ], Y[train], ncomp = 2, keepX = c(30, 30))
test.predict <- predict(splsda.train, X[test, ], method = "max.dist")
Prediction <- levels(Y)[test.predict$class$max.dist[, 2]]
cbind(Y = as.character(Y[test]), Prediction)

Run the code above in your browser using DataLab