Applies iPLS alrogithm to find variable intervals most important for prediction
ipls(x, y, glob.ncomp = 10, center = T, scale = F, cv = 10,
exclcols = NULL, exclrows = NULL, int.ncomp = 10, int.num = NULL,
int.width = NULL, int.limits = NULL, int.niter = NULL,
ncomp.selcrit = "min", method = "forward", silent = F)
a matrix with predictor values
a vector with response values
maximum number of components for a global PLS model
logical, center or not the data values
logical, standardize or not the data values
number of segments for cross-validation (1 - full CV)
columns of x to be excluded from calculations (numbers, names or vector with logical values)
rows to be excluded from calculations (numbers, names or vector with logical values)
maximum number of components for interval PLS models
number of intervals
width of intervals
a two column matrix with manual intervals specification
maximum number of iterations (if NULL it will be the same as number of intervals)
criterion for selecting optimal number of components ('min' for minimum of RMSECV)
iPLS method ('forward'
or 'backward'
)
logical, show or not information about selection process
object of 'ipls' class with several fields, including:
a vector with indices of selected variables
a vector with indices of selected intervals
total number of intervals
width of the intervals
a matrix with limits for each interval
a data frame with statistics for the selection algorithm
a data frame with statistics for the first step (individual intervals)
global PLS model with all variables included
optimized PLS model with selected variables
The algorithm splits the predictors into several intervals and tries to find a combination of the intervals, which gives best prediction performance. There are two selection methods: "forward" when the intervals are successively included, and "backward" when the intervals are successively excluded from a model. On the first step the algorithm finds the best (forward) or the worst (backward) individual interval. Then it tests the others to find the one which gives the best model in a combination with the already selected/excluded one. The procedure continues until the maximum number of iteration is reached.
There are several ways to specify the intervals. First of all either number of intervals
(int.num
) or width of the intervals (int.width
) can be provided. Alternatively
one can specify the limits (first and last variable number) of the intervals manually
with int.limits
.
[1] Lars Noergaard at al. Interval partial least-squares regression (iPLS): a comparative chemometric study with an example from near-infrared spectroscopy. Appl.Spec. 2000; 54: 413-419
# NOT RUN {
library(mdatools)
## forward selection for simdata
data(simdata)
Xc = simdata$spectra.c
yc = simdata$conc.c[, 3, drop = FALSE]
# run iPLS and show results
im = ipls(Xc, yc, int.ncomp = 5, int.num = 10, cv = 4, method = "forward")
summary(im)
plot(im)
# show "developing" of RMSECV during the algorithm execution
plotRMSE(im)
# plot predictions before and after selection
par(mfrow = c(1, 2))
plotPredictions(im$gm)
plotPredictions(im$om)
# show selected intervals on spectral plot
ind = im$var.selected
mspectrum = apply(Xc, 2, mean)
plot(simdata$wavelength, mspectrum, type = 'l', col = 'lightblue')
points(simdata$wavelength[ind], mspectrum[ind], pch = 16, col = 'blue')
# }
Run the code above in your browser using DataLab