ipls: Variable selection with interval PLS

Description

Applies iPLS alrogithm to find variable intervals most important for prediction

Usage

ipls(x, y, glob.ncomp = 10, center = T, scale = F, cv = 10,
  exclcols = NULL, exclrows = NULL, int.ncomp = 10, int.num = NULL,
  int.width = NULL, int.limits = NULL, int.niter = NULL,
  ncomp.selcrit = "min", method = "forward", silent = F)

Arguments

a matrix with predictor values

a vector with response values

glob.ncomp

maximum number of components for a global PLS model

center

logical, center or not the data values

scale

logical, standardize or not the data values

number of segments for cross-validation (1 - full CV)

exclcols

columns of x to be excluded from calculations (numbers, names or vector with logical values)

exclrows

rows to be excluded from calculations (numbers, names or vector with logical values)

int.ncomp

maximum number of components for interval PLS models

int.num

number of intervals

int.width

width of intervals

int.limits

a two column matrix with manual intervals specification

int.niter

maximum number of iterations (if NULL it will be the same as number of intervals)

ncomp.selcrit

criterion for selecting optimal number of components ('min' for minimum of RMSECV)

method

iPLS method ('forward' or 'backward')

silent

logical, show or not information about selection process

Value

object of 'ipls' class with several fields, including:

var.selected

a vector with indices of selected variables

int.selected

a vector with indices of selected intervals

int.num

total number of intervals

int.width

width of the intervals

int.limits

a matrix with limits for each interval

int.stat

a data frame with statistics for the selection algorithm

glob.stat

a data frame with statistics for the first step (individual intervals)

global PLS model with all variables included

optimized PLS model with selected variables

Details

The algorithm splits the predictors into several intervals and tries to find a combination of the intervals, which gives best prediction performance. There are two selection methods: "forward" when the intervals are successively included, and "backward" when the intervals are successively excluded from a model. On the first step the algorithm finds the best (forward) or the worst (backward) individual interval. Then it tests the others to find the one which gives the best model in a combination with the already selected/excluded one. The procedure continues until the maximum number of iteration is reached.

There are several ways to specify the intervals. First of all either number of intervals (int.num) or width of the intervals (int.width) can be provided. Alternatively one can specify the limits (first and last variable number) of the intervals manually with int.limits.

References

[1] Lars Noergaard at al. Interval partial least-squares regression (iPLS): a comparative chemometric study with an example from near-infrared spectroscopy. Appl.Spec. 2000; 54: 413-419

Examples

Run this code

# NOT RUN {
library(mdatools)

## forward selection for simdata
 
data(simdata)
Xc = simdata$spectra.c
yc = simdata$conc.c[, 3, drop = FALSE]

# run iPLS and show results  
im = ipls(Xc, yc, int.ncomp = 5, int.num = 10, cv = 4, method = "forward")
summary(im)
plot(im)
 
# show "developing" of RMSECV during the algorithm execution
plotRMSE(im)
 
# plot predictions before and after selection
par(mfrow = c(1, 2))
plotPredictions(im$gm)
plotPredictions(im$om)
 
# show selected intervals on spectral plot
ind = im$var.selected
mspectrum = apply(Xc, 2, mean)
plot(simdata$wavelength, mspectrum, type = 'l', col = 'lightblue')
points(simdata$wavelength[ind], mspectrum[ind], pch = 16, col = 'blue')
 
# }

Run the code above in your browser using DataLab