predict: Predict Method for PLS, sparse PLS, PLSDA Regression or Sparse PLSDA

Description

Predicted values based on PLS, sparse PLS, PLSDA or sparse PLSDA models. New responses and variates are predicted using a fitted model and a new matrix of observations.

Usage

## S3 method for class 'pls':
predict(object, newdata, ...)
## S3 method for class 'spls':
predict(object, newdata, ...)
## S3 method for class 'plsda':
predict(object, newdata, method = c("max.dist", "class.dist", "centroids.dist", "mahalanobis.dist"), ...)
## S3 method for class 'splsda':
predict(object, newdata, method = c("max.dist", "class.dist", "centroids.dist", "mahalanobis.dist"), ...)

Arguments

object

object of class inheriting from "pls", "spls", "plsda" or "splsda".

newdata

data matrix in which to look for for explanatory variables to be used for prediction.

method

method to be applied for splsda or plsda to predict the class of new data, (partially) matching one of "class.dist", "centroids.dist", "mahalanobis.dist" or "max.dist"

...

other arguments to be passed predict.default.

Value

predict produces a list with the following components:
predicta three dimensional array of predicted response values. The dimensions correspond to the observations, the response variables and the model dimension, respectively.
variatesmatrix of predicted variates.
B.hatmatrix of regression coefficients (without the intercept).
classvector or matrix of predicted class by using $1,...,$ncomp (sparse)PLS-DA components.
centroidmatrix of coordinates for centroids.

encoding

latin1

Details

predict produces predicted values, obtained by evaluating the PLS/sPLS/PLSDA or sparse PLSDA model returned by pls, spls, plsda or splsda in the frame newdata. Variates for newdata are also returned. Different class prediction methods are proposed for splsda or plsda: max.dist is the naive method to predict the class. It is based on the predicted matrix (object$predict) which can be seen as a probability matrix to assign each test data to a class. The class with the largest class value is the predicted class. class.dist allocates the predicted individual $x$ to the class of $Y$ minimizing $dist(x, C_l)$, where $C_l$, $l = 1,...,L$ are the indicator vectors corresponding to each class and $L$ = number of class. centroids.dist allocates the individual $x$ to the class of $Y$ minimizing $dist(\code{x-variate}, G_l)$, where $G_l$, $l = 1,...,L$ are the centroids of the classes calculated on the X-variates of the model. mahalanobis.dist allocates the individual $x$ to the class of $Y$ as in centroids.dist but by using the metric Mahalanobis in the calculation of the distance.

References

Tenenhaus, M. (1998). La r�gression PLS: th�orie et pratique. Paris: Editions Technic.

Examples

Run this code

data(linnerud)
X <- linnerud$exercise
Y <- linnerud$physiological
linn.pls <- pls(X, Y, ncomp = 2, mode = "classic")

indiv1 <- c(200, 40, 60)
indiv2 <- c(190, 45, 45)
newdata <- rbind(indiv1, indiv2)
colnames(newdata) <- colnames(X)
newdata

pred <- predict(linn.pls, newdata)

plotIndiv(linn.pls, comp = 1:2, rep.space = "X-variate")
points(pred$variates[, 1], pred$variates[, 2], pch = 19, cex = 1.2)
text(pred$variates[, 1], pred$variates[, 2], 
     c("new ind.1", "new ind.2"), pos = 3)
	 
## First example with plsda
data(liver.toxicity)
X = as.matrix(liver.toxicity$gene)
Y = as.factor(liver.toxicity$treatment[,4] )

# if training is perfomed on 4/5th of the original data
samp = sample(1:5, nrow(X), replace = TRUE)  
test = which(samp == 1)   # testing on the first fold
train = setdiff(1:nrow(X), test)

plsda.train = plsda(X[train,], Y[train], ncomp = 1,  mode = 'regression')
test.predict = predict(plsda.train, X[test,], method = "class.dist")
test.predict$class

## Second example with splsda
data(liver.toxicity)
X = as.matrix(liver.toxicity$gene)
Y = as.factor(liver.toxicity$treatment[, 4])  # time points

# if training is perfomed on 4/5th of the original data
samp = sample(1:5, nrow(X), replace = TRUE)  
test = which(samp == 1)   # testing on the first fold
train = setdiff(1:nrow(X), test)

splsda.train = splsda(X[train,], Y[train], ncomp = 1, keepX = 20, mode = 'regression')
test.predict = predict(splsda.train, X[test,], method = "class.dist")
test.predict$class

Run the code above in your browser using DataLab