Learn R Programming

mdatools (version 0.7.0)

ipls: Variable selection with interval PLS

Description

Applies iPLS alrogithm to find variable intervals most important for prediction

Usage

ipls(x, y, glob.ncomp = 10, center = T, scale = F, cv = 10, int.ncomp = 10, int.num = NULL, int.width = NULL, int.limits = NULL, int.niter = NULL, ncomp.selcrit = "min", method = "forward", silent = F)

Arguments

x
a matrix with predictor values
y
a vector with response values
glob.ncomp
maximum number of components for a global PLS model
center
logical, center or not the data values
scale
logical, standardize or not the data values
cv
number of segments for cross-validation (1 - full CV)
int.ncomp
maximum number of components for interval PLS models
int.num
number of intervals
int.width
width of intervals
int.limits
a two column matrix with manual intervals specification
int.niter
maximum number of iterations (if NULL it will be the same as number of intervals)
ncomp.selcrit
criterion for selecting optimal number of components ('min' for minimum of RMSECV)
method
iPLS method ('forward' or 'backward')
silent
logical, show or not information about selection process

Value

object of 'ipls' class with several fields, including:
var.selected
a vector with indices of selected variables
int.selected
a vector with indices of selected intervals
int.num
total number of intervals
int.width
width of the intervals
int.limits
a matrix with limits for each interval
int.stat
a data frame with statistics for the selection algorithm
glob.stat
a data frame with statistics for the first step (individual intervals)
gm
global PLS model with all variables included
om
optimized PLS model with selected variables

Details

The algorithm splits the predictors into several intervals and tries to find a combination of the intervals, which gives best prediction performance. There are two selection methods: "forward" when the intervals are successively included, and "backward" when the intervals are successively excluded from a model. On the first step the algorithm finds the best (forward) or the worst (backward) individual interval. Then it tests the others to find the one which gives the best model in a combination with the already selected/excluded one. The procedure continues until the maximum number of iteration is reached.

There are several ways to specify the intervals. First of all either number of intervals (int.num) or width of the intervals (int.width) can be provided. Alternatively one can specify the limits (first and last variable number) of the intervals manually with int.limits.

References

[1] Lars Noergaard at al. Interval partial least-squares regression (iPLS): a comparative chemometric study with an example from near-infrared spectroscopy. Appl.Spec. 2000; 54: 413-419

Examples

Run this code
library(mdatools)

## forward selection for simdata

data(simdata)
Xc = simdata$spectra.c
yc = simdata$conc.c[, 3, drop = FALSE]

# run iPLS and show results
im = ipls(Xc, yc, int.ncomp = 5, int.num = 10, cv = 4, method = "forward")
summary(im)
plot(im)

# show "developing" of RMSECV during the algorithm execution
plotRMSE(im)

# plot predictions before and after selection
par(mfrow = c(1, 2))
plotPredictions(im$gm)
plotPredictions(im$om)

# show selected intervals on spectral plot
ind = im$var.selected
mspectrum = apply(Xc, 2, mean)
plot(simdata$wavelength, mspectrum, type = 'l', col = 'lightblue')
points(simdata$wavelength[ind], mspectrum[ind], pch = 16, col = 'blue')

Run the code above in your browser using DataLab