pls.double.cv: Cross-Validation with PLS-DA.

Description

This function performs a 10-fold cross validation on a given data set using Partial Least Squares (PLS) model. To assess the prediction ability of the model, a 10-fold cross-validation is conducted by generating splits with a ratio 1:9 of the data set, that is by removing 10% of samples prior to any step of the statistical analysis, including PLS component selection and scaling. Best number of component for PLS was carried out by means of 10-fold cross-validation on the remaining 90% selecting the best Q2y value. Permutation testing was undertaken to estimate the classification/regression performance of predictors.

Usage

pls.double.cv (Xdata,
               Ydata,
               ncomp=min(5,c(ncol(Xdata),nrow(Xdata))),
               constrain=1:nrow(Xdata),
               scaling = c("centering", "autoscaling","none"), 
               method = c("plssvd", "simpls"),
              svd.method = c("irlba", "dc"),
               perm.test=FALSE,
               times=100,
               runn=10,
               kfold_inner=10, 
               kfold_outer=10)

Value

A list with the following components:

B: the (p x m x length(ncomp)) array containing the regression coefficients. Each row corresponds to a predictor variable and each column to a response variable. The third dimension of the matrix B corresponds to the number of PLS components used to compute the regression coefficients. If ncomp has length 1, B is just a (p x m) matrix.
Ypred: the vector containing the predicted values of the response variables obtained by cross-validation.
Yfit: the vector containing the fitted values of the response variables.
P: the (p x max(ncomp)) matrix containing the X-loadings.
Q: the (m x max(ncomp)) matrix containing the Y-loadings.
T: the (ntrain x max(ncomp)) matrix containing the X-scores (latent components)
R: the (p x max(ncomp)) matrix containing the weights used to construct the latent components.
Q2Y: predictive power of the model.
R2Y: proportion of variance in Y.
R2X: vector containg the explained variance of X by each PLS component.
txtQ2Y: a summary of the Q2y values.
txtR2Y: a summary of the R2y values.

Arguments

Xdata: a matrix.
Ydata: the responses. If Ydata is a numeric vector, a regression analysis will be performed. If Ydata is factor, a classification analysis will be performed.
ncomp: the number of latent components to be used for classification.
constrain: a vector of nrow(data) elements. Sample with the same identifying constrain will be split in the training set or in the test set of cross-validation together.
scaling: the scaling method to be used. Choices are "centering", "autoscaling", or "none" (by default = "centering"). A partial string sufficient to uniquely identify the choice is permitted.
method: the algorithm to be used to perform the PLS. Choices are "plssvd" or "simpls" (by default = "plssvd"). A partial string sufficient to uniquely identify the choice is permitted.
svd.method: the SVD method to be used to perform the PLS. Choices are "irlba" or "dc" (by default = "irlba"). A partial string sufficient to uniquely identify the choice is permitted.
perm.test: a classification vector.
times: number of cross-validations with permutated samples
runn: number of cross-validations loops.
kfold_inner: if perform the optmization of the number of components.
kfold_outer: if perform the optmization of the number of components.

Author

Dupe Ojo, Alessia Vignoli, Stefano Cacciatore, Leonardo Tenori

Examples

Run this code

# \donttest{
data(iris)
data=iris[,-5]
labels=iris[,5]
pp=pls.double.cv(data,labels,2:4)

# }