The function helps selecting the dimensionnality of latent variable (LV) models (e.g. PLSR) using the "Wold criterion".
The criterion is the "precision gain ratio"
In the original article, Wold (1978; see also Bro et al. 2008) used the ratio of cross-validated over training residual sums of squares, i.e. PRESS over SSR. Instead, selwold
compares values of consistent nature (the successive values in the input vector
The ratio selwold
proposes to calculate a smoothing of
selwold(
r, indx = seq(length(r)),
smooth = TRUE, f = 1/3,
alpha = .05, digits = 3,
plot = TRUE,
xlab = "Index", ylab = "Value", main = "r",
...
)
matrix with for each number of Lvs:
The index of the minimum for
The index of the selection from the
Vector of a given error rate (
Vector of indexes (
Logical. If TRUE
(default), the selection is done on the smoothed
Window for smoothing lowess
.
Proportion
Number of digits for
Logical. If TRUE
(default), results are plotted.
x-axis label of the plot of
y-axis label of the plot of
Title of the plot of
Other arguments to pass in function lowess
.
Andries, J.P.M., Vander Heyden, Y., Buydens, L.M.C., 2011. Improved variable reduction in partial least squares modelling based on Predictive-Property-Ranked Variables and adaptation of partial least squares complexity. Analytica Chimica Acta 705, 292-305. https://doi.org/10.1016/j.aca.2011.06.037
Bro, R., Kjeldahl, K., Smilde, A.K., Kiers, H.A.L., 2008. Cross-validation of component models: A critical look at current methods. Anal Bioanal Chem 390, 1241-1251. https://doi.org/10.1007/s00216-007-1790-1
Li, B., Morris, J., Martin, E.B., 2002. Model selection for partial least squares regression. Chemometrics and Intelligent Laboratory Systems 64, 79-89. https://doi.org/10.1016/S0169-7439(02)00051-5
Westad, F., Martens, H., 2000. Variable Selection in near Infrared Spectroscopy Based on Significance Testing in Partial Least Squares Regression. J. Near Infrared Spectrosc., JNIRS 8, 117-124.
Wold S. Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models. Technometrics. 1978;20(4):397-405
data(cassav)
Xtrain <- cassav$Xtrain
ytrain <- cassav$ytrain
X <- cassav$Xtest
y <- cassav$ytest
nlv <- 20
res <- gridscorelv(
Xtrain, ytrain, X, y,
score = msep, fun = plskern,
nlv = 0:nlv
)
selwold(res$y1, res$nlv, f = 2/3)
Run the code above in your browser using DataLab