Function to evaluate the performance of the fitted sparse PLS, group PLS, sparse group PLS, sparse PLS-DA, group PLS-DA and sparse group PLS-DA models using various criteria.
# S3 method for sPLS
perf(object,
criterion = c("all", "MSEP", "R2", "Q2"),
validation = c("Mfold", "loo"),
folds = 10, progressBar = TRUE, setseed = 1,...)
# S3 method for gPLS
perf(object,
criterion = c("all", "MSEP", "R2", "Q2"),
validation = c("Mfold", "loo"),
folds = 10, progressBar = TRUE, setseed = 1, ...)
# S3 method for sgPLS
perf(object,
criterion = c("all", "MSEP", "R2", "Q2"),
validation = c("Mfold", "loo"),
folds = 10, progressBar = TRUE,setseed = 1, ...)
# S3 method for sPLSda
perf(object,
method.predict = c("all", "max.dist", "centroids.dist", "mahalanobis.dist"),
validation = c("Mfold", "loo"),
folds = 10, progressBar = TRUE, ...) # S3 method for gPLSda
perf(object,
method.predict = c("all", "max.dist", "centroids.dist", "mahalanobis.dist"),
validation = c("Mfold", "loo"),
folds = 10, progressBar = TRUE, ...)
# S3 method for sgPLSda
perf(object,
method.predict = c("all", "max.dist", "centroids.dist", "mahalanobis.dist"),
validation = c("Mfold", "loo"),
folds = 10, progressBar = TRUE, ...)
perf
produces a list with the following components:
Mean Square Error Prediction for each \(Y\) variable, only applies to object inherited from "sPLS"
, "gPLS"
and "sgPLS"
.
a matrix of \(R^2\) values of the \(Y\)-variables for models
with \(1, \ldots ,\)ncomp
components, only applies to object inherited from "sPLS"
, "gPLS"
and "sgPLS"
.
if \(Y\) contains one variable, a vector of \(Q^2\) values else a list with
a matrix of \(Q^2\) values for each \(Y\)-variable. Note that in the specific case of an sPLS model, it is better to have a look at the Q2.total criterion, only applies to object inherited from from "sPLS"
, "gPLS"
and "sgPLS"
.
a vector of \(Q^2\)-total values for models with \(1, \ldots ,\)ncomp
components, only applies to object inherited from from "sPLS"
, "gPLS"
and "sgPLS"
.
a list of features selected across the folds ($stable.X
and $stable.Y
) or on the whole data set ($final
) for the keepX
and keepY
parameters from the input object.
For sPLS-DA, gPLS-DA and sgPLS-DA models, perf
produces a matrix of classification error rate estimation.
The dimensions correspond to the components in the model and to the prediction method used, respectively. Note that error rates reported in any component include the performance of the model in earlier components for the specified keepX
parameters (e.g. error rate reported for component 3 for keepX = 20
already includes the fitted model on components 1 and 2 for keepX = 20
). For more advanced usage of the perf
function, see mixOmics package and consider using the predict
function.
Object of class inheriting from "sPLS"
, "gPLS"
, "sgPLS"
, "sPLSda"
, "gPLSda"
or "sgPLSda"
. The function will retrieve some key parameters stored in that object.
The criteria measures to be calculated (see Details). Can be set to either "all"
, "MSEP"
, "R2"
, "Q2"
. By default set to "all"
. Only applies to an object inheriting from "sPLS"
, "gPLS"
or "sgPLS"
only applies to an object inheriting from "PLSda"
, "gPLSda"
or "sgPLSda"
to evaluate the classification performance of the model. Should be a subset of "max.dist"
, "centroids.dist"
, "mahalanobis.dist"
. Default is "all"
. See predict
.
Character. What kind of (internal) validation to use, matching one of "Mfold"
or
"loo"
(see below). Default is "Mfold"
.
The folds in the Mfold cross-validation. See Details.
By default set to TRUE
to output the progress bar of the computation.
Integer value to specify the random generator state.
Not used at the moment.
Benoit Liquet and Pierre Lafaye de Micheaux
The method perf
has been created by Sebastien Dejean, Ignacio Gonzalez, Amrit Singh and Kim-Anh Le Cao for pls and spls models performed by mixOmics
package. Similar code has been adapted for sPLS, gPLS and sgPLS in the package sgPLS
.
perf
estimates the
mean squared error of prediction (MSEP), \(R^2\), and \(Q^2\) to assess the predictive
performance of the model using M-fold or leave-one-out cross-validation. Note that only the classic
, regression
and invariant
modes can be applied.
If validation = "Mfold"
, M-fold cross-validation is performed.
How many folds to generate is selected by specifying the number of folds in folds
.
The folds also can be supplied as a list of vectors containing the indexes defining each
fold as produced by split
.
If validation = "loo"
, leave-one-out cross-validation is performed.
For fitted sPLS-DA, gPLS-DA and sgPLS-DA models, perf
estimates the classification error rate
using cross-validation.
Note that the perf
function will retrieve the keepX
and keepY
inputs from the previously run object. The sPLS, gPLS, sgPLS, sPLSda, gPLSda or sgPLSda functions will be run again on several and different subsets of data (the cross-folds) and certainly on different subset of selected features. For sPLS, the MSEP, \(R^2\), and \(Q^2\) criteria are averaged across all folds. A feature stability measure is output for the user to assess how often the variables are selected across all folds. For sPLS-DA, the classification erro rate is averaged across all folds.
Tenenhaus, M. (1998). La r\'egression PLS: th\'eorie et pratique. Paris: Editions Technic.
Le Cao, K.-A., Rossouw, D., Robert-Grani\'e, C. and Besse, P. (2008). A sparse PLS for variable selection when integrating Omics data. Statistical Applications in Genetics and Molecular Biology 7, article 35.
Mevik, B.-H., Cederkvist, H. R. (2004). Mean Squared Error of Prediction (MSEP) Estimates for Principal Component Regression (PCR) and Partial Least Squares Regression (PLSR). Journal of Chemometrics 18(9), 422-429.
predict
, plot.perf
(from package mixOmics
)
## validation for objects of class 'sPLS' (regression)
## Example from mixOmics package
# ----------------------------------------
if (FALSE) {
data(liver.toxicity)
X <- liver.toxicity$gene
Y <- liver.toxicity$clinic
## validation for objects of class 'spls' (regression)
# ----------------------------------------
ncomp <- 7
# first, learn the model on the whole data set
model.spls <- sPLS(X, Y, ncomp = ncomp, mode = 'regression',
keepX = c(rep(5, ncomp)), keepY = c(rep(2, ncomp)))
# with leave-one-out cross validation
set.seed(45)
model.spls.loo.val <- perf(model.spls, validation = "loo")
#Q2 total
model.spls.loo.val$Q2.total
# R2:we can see how the performance degrades when ncomp increases
# results are similar to 5-fold
model.spls.loo.val$R2
}
Run the code above in your browser using DataLab