Last chance! 50% off unlimited learning
Sale ends in
## S3 method for class 'pls':
perf(object,validation = c("Mfold", "loo"),
folds = 10, progressBar = TRUE, ...) ## S3 method for class 'spls':
perf(object,validation = c("Mfold", "loo"),
folds = 10, progressBar = TRUE, ...)
## S3 method for class 'plsda':
perf(object,
method.predict = c("all", "max.dist", "centroids.dist", "mahalanobis.dist"),
validation = c("Mfold", "loo"),
folds = 10, progressBar = TRUE, near.zero.var = FALSE, ...)
## S3 method for class 'splsda':
perf(object,
method.predict = c("all", "max.dist", "centroids.dist", "mahalanobis.dist"),
validation = c("Mfold", "loo"),
folds = 10, progressBar = TRUE, near.zero.var = FALSE, ...)
"pls"
, "plsda"
,
"spls"
or "splsda"
. The function will retrieve some key parameters stored in that object."plsda"
or "splsda"
to evaluate the classification performance of the model. Should be a subset of "max.dist"
, "centroids.dist"
, "mahalanobis.dist"
."Mfold"
or
"loo"
(see below). Default is "Mfold"
.TRUE
to output the progress bar of the computation.FALSE)
in perf.plsda and perf.splsda. However, the nearZeroVar function is still applied by default on the whole data set at the start of the function. When set to TRUE
, nearZeroVar is also applied on each crossperf
produces a list with the following components:"pls"
, and "spls"
.ncomp
components, only applies to object inherited from "pls"
, and "spls"
."pls"
, and "spls"
ncomp
components, only applies to object inherited from "pls"
, and "spls"
$stable.X
and $stable.Y
) for the keepX
and keepY
parameters from the input object.perf
produces a matrix of classification error rate estimation.
The dimensions correspond to the components in the model and to the prediction method used, respectively. Note that error rates reported in any component include the performance of the model in earlier components for the specified keepX
parameters (e.g. error rate reported for component 3 for keepX = 20
already includes the fitted model on components 1 and 2 for keepX = 20
). For more advanced usage of the perf
function, see predict
function.perf
estimates the
mean squared error of prediction (MSEP), $R^2$, and $Q^2$ to assess the predictive
perfity of the model using M-fold or leave-one-out cross-validation. Note that only the classic
, regression
and invariant
modes can be applied.If validation = "Mfold"
, M-fold cross-validation is performed.
How many folds to generate is selected by specifying the number of folds in folds
.
The folds also can be supplied as a list of vectors containing the indexes defining each
fold as produced by split
. When using validation = "Mfold"
, make sure that you repeat the process several times (as the results will be highly dependent on the random splits and the sample size).
If validation = "loo"
, leave-one-out cross-validation is performed (in that case, there is no need to repeat the process).
For fitted PLS-DA and sPLS-DA models, perf
estimates the classification error rate
using cross-validation.
For the sparse approaches (sPLS and sLDA), note that the perf
function will retrieve the keepX
and keepY
inputs from the previously run object. The sPLS or sPLS-DA functions will then be run again on several and different subsets of data (the cross-folds) and will certainly lead different subset of selected features. Those are summarised in the output features$stable
(see output Value below) to assess how often the variables are selected on across all folds.
For sPLS, the MSEP, $R^2$, and $Q^2$ criteria are averaged across all folds. For sPLS-DA, the classification erro rate is averaged across all folds.
Chavent, Marie and Patouille, Brigitte (2003). Calcul des coefficients de r{e}gression et du PRESS en r{e}gression PLS1. Modulad n, 30 1-11. (this is the formula we use to calculate the Q2 in perf.pls and perf.spls)
Le Cao, K. A., Rossouw D., Robert-Granie, C. and Besse, P. (2008). A sparse PLS for variable selection when integrating Omics data. Statistical Applications in Genetics and Molecular Biology 7, article 35.
Mevik, B.-H., Cederkvist, H. R. (2004). Mean Squared Error of Prediction (MSEP) Estimates for Principal Component Regression (PCR) and Partial Least Squares Regression (PLSR). Journal of Chemometrics 18(9), 422-429.
predict
, nipals
, plot.perf
and ## validation for objects of class 'pls' (regression)
# ----------------------------------------
data(liver.toxicity)
X <- liver.toxicity$gene
Y <- liver.toxicity$clinic
# try tune the number of component to choose
# ---------------------
# first learn the full model
liver.pls <- pls(X, Y, ncomp = 10)
# with 5-fold cross validation: we use the same parameters as in model above
# but we perform cross validation to compute the MSEP, Q2 and R2 criteria
# ---------------------------
liver.val <- perf(liver.pls, validation = "Mfold", folds = 5)
# Q2 total should decrease until it reaches a threshold
liver.val$Q2.total
# ncomp = 2 is enough
plot(liver.val$Q2.total, type = 'l', col = 'red', ylim = c(-0.5, 0.5),
xlab = 'PLS components', ylab = 'Q2 total')
abline(h = 0.0975, col = 'darkgreen')
legend('topright', col = c('red', 'darkgreen'),
legend = c('Q2 total', 'threshold 0.0975'), lty = 1)
title('Liver toxicity PLS 5-fold, Q2 total values')
#have a look at the other criteria
# ----------------------
# R2
liver.val$R2
matplot(t(liver.val$R2), type = 'l', xlab = 'PLS components', ylab = 'R2 for each variable')
title('Liver toxicity PLS 5-fold, R2 values')
# MSEP
liver.val$MSEP
matplot(t(liver.val$MSEP), type = 'l', xlab = 'PLS components', ylab = 'MSEP for each variable')
title('Liver toxicity PLS 5-fold, MSEP values')
## validation for objects of class 'spls' (regression)
# ----------------------------------------
ncomp = 7
# first, learn the model on the whole data set
model.spls = spls(X, Y, ncomp = ncomp, mode = 'regression',
keepX = c(rep(10, ncomp)), keepY = c(rep(4,ncomp)))
# with leave-one-out cross validation
##set.seed(45)
model.spls.val <- perf(model.spls, validation = "Mfold", folds = 5 )#validation = "loo")
#Q2 total
model.spls.val$Q2.total
# R2:we can see how the performance degrades when ncomp increases
model.spls.val$R2
plot(model.spls.val, criterion="R2", type = 'l')
plot(model.spls.val, criterion="Q2", type = 'l')
## validation for objects of class 'splsda' (classification)
# ----------------------------------------
data(srbct)
X <- srbct$gene
Y <- srbct$class
ncomp = 5
srbct.splsda <- splsda(X, Y, ncomp = ncomp, keepX = rep(10, ncomp))
# with Mfold
# ---------
set.seed(45)
error <- perf(srbct.splsda, validation = "Mfold", folds = 8,
method.predict = "all")
plot(error, type = "l")
Run the code above in your browser using DataLab