Learn R Programming

mdatools (version 0.7.0)

pls: Partial Least Squares regression

Description

pls is used to calibrate, validate and use of partial least squares (PLS) regression model.

Usage

pls(x, y, ncomp = 15, center = T, scale = F, cv = NULL, x.test = NULL, y.test = NULL, method = "simpls", alpha = 0.05, coeffs.ci = NULL, coeffs.alpha = 0.1, info = "", light = F, ncomp.selcrit = "min")

Arguments

x
matrix with predictors.
y
matrix with responses.
ncomp
maximum number of components to calculate.
center
logical, center or not predictors and response values.
scale
logical, scale (standardize) or not predictors and response values.
cv
number of segments for cross-validation (if cv = 1, full cross-validation will be used).
x.test
matrix with predictors for test set.
y.test
matrix with responses for test set.
method
method for calculating PLS model.
alpha
significance level for calculating statistical limits for residuals.
coeffs.ci
method to calculate p-values and confidence intervals for regression coefficients (so far only jack-knifing is availavle: ='jk').
coeffs.alpha
significance level for calculating confidence intervals for regression coefficients.
info
short text with information about the model.
light
run normal or light (faster) version of PLS without calculationg some performance statistics.
ncomp.selcrit
criterion for selecting optimal number of components ('min' for first local minimum of RMSECV and 'wold' for Wold's rule.)

Value

Returns an object of pls class with following fields:
ncomp
number of components included to the model.
ncomp.selected
selected (optimal) number of components.
xloadings
matrix with loading values for x decomposition.
yloadings
matrix with loading values for y decomposition.
weights
matrix with PLS weights.
selratio
array with selectivity ratio values.
vipscores
matrix with VIP scores values.
coeffs
object of class regcoeffs with regression coefficients calculated for each component.
info
information about the model, provided by user when build the model.
calres
an object of class plsres with PLS results for a calibration data.
testres
an object of class plsres with PLS results for a test data, if it was provided.
cvres
an object of class plsres with PLS results for cross-validation, if this option was chosen.

Details

So far only SIMPLS method [1] is available, more coming soon. Implementation works both with one and multiple response variables.

Like in pca, pls uses number of components (ncomp) as a minimum of number of objects - 1, number of x variables and the default or provided value. Regression coefficients, predictions and other results are calculated for each set of components from 1 to ncomp: 1, 1:2, 1:3, etc. The optimal number of components, (ncomp.selected), is found using Wold's R criterion, but can be adjusted by user using function (selectCompNum.pls). The selected optimal number of components is used for all default operations - predictions, plots, etc.

Selectivity ratio [2] and VIP scores [3] are calculated for any PLS model authomatically, however while selectivity ratio values are calculated for all computed components, the VIP scores are computed only for selected components (to save calculation time) and recalculated every time when selectCompNum() is called for the model.

Calculation of confidence intervals and p-values for regression coefficients are available only by jack-knifing so far. See help for regcoeffs objects for details.

References

1. S. de Jong, Chemometrics and Intelligent Laboratory Systems 18 (1993) 251-263. 2. Tarja Rajalahti et al. Chemometrics and Laboratory Systems, 95 (2009), 35-48. 3. Il-Gyo Chong, Chi-Hyuck Jun. Chemometrics and Laboratory Systems, 78 (2005), 103-112.

See Also

Methods for pls objects:
print
prints information about a pls object.
summary.pls
shows performance statistics for the model.
plot.pls
shows plot overview of the model.
pls.simpls
implementation of SIMPLS algorithm.
predict.pls
applies PLS model to a new data.
selectCompNum.pls
set number of optimal components in the model.
plotPredictions.pls
shows predicted vs. measured plot.
plotRegcoeffs.pls
shows regression coefficients plot.
plotXScores.pls
shows scores plot for x decomposition.
plotXYScores.pls
shows scores plot for x and y decomposition.
plotXLoadings.pls
shows loadings plot for x decomposition.
plotXYLoadings.pls
shows loadings plot for x and y decomposition.
plotRMSE.pls
shows RMSE plot.
plotXVariance.pls
shows explained variance plot for x decomposition.
plotYVariance.pls
shows explained variance plot for y decomposition.
plotXCumVariance.pls
shows cumulative explained variance plot for y decomposition.
plotYCumVariance.pls
shows cumulative explained variance plot for y decomposition.
plotXResiduals.pls
shows T2 vs. Q plot for x decomposition.
plotYResiduals.pls
shows residuals plot for y values.
plotSelectivityRatio.pls
shows plot with selectivity ratio values.
plotVIPScores.pls
shows plot with VIP scores values.
getSelectivityRatio.pls
returns vector with selectivity ratio values.
getVIPScores.pls
returns vector with VIP scores values.
getRegcoeffs.pls
returns matrix with regression coefficients.

Most of the methods for plotting data (except loadings and regression coefficients) are also available for PLS results (plsres) objects. There is also a randomization test for PLS-regression (randtest).

Examples

Run this code
### Examples of using PLS model class
library(mdatools)

## 1. Make a PLS model for concentration of first component
## using full-cross validation and automatic detection of
## optimal number of components and show an overview

data(simdata)
x = simdata$spectra.c
y = simdata$conc.c[, 1]

model = pls(x, y, ncomp = 8, cv = 1)
summary(model)
plot(model)

## 2. Make a PLS model for concentration of first component
## using test set and 10 segment cross-validation and show overview

data(simdata)
x = simdata$spectra.c
y = simdata$conc.c[, 1]
x.t = simdata$spectra.t
y.t = simdata$conc.t[, 1]

model = pls(x, y, ncomp = 8, cv = 10, x.test = x.t, y.test = y.t)
model = selectCompNum(model, 2)
summary(model)
plot(model)

## 3. Make a PLS model for concentration of first component
## using only test set validation and show overview

data(simdata)
x = simdata$spectra.c
y = simdata$conc.c[, 1]
x.t = simdata$spectra.t
y.t = simdata$conc.t[, 1]

model = pls(x, y, ncomp = 6, x.test = x.t, y.test = y.t)
model = selectCompNum(model, 2)
summary(model)
plot(model)

## 4. Show variance and error plots for a PLS model
par(mfrow = c(2, 2))
plotXCumVariance(model, type = 'h')
plotYCumVariance(model, type = 'b', show.labels = TRUE, legend.position = 'bottomright')
plotRMSE(model)
plotRMSE(model, type = 'h', show.labels = TRUE)
par(mfrow = c(1, 1))

## 5. Show scores plots for a PLS model
par(mfrow = c(2, 2))
plotXScores(model)
plotXScores(model, comp = c(1, 3), show.labels = TRUE)
plotXYScores(model)
plotXYScores(model, comp = 2, show.labels = TRUE)
par(mfrow = c(1, 1))

## 6. Show loadings and coefficients plots for a PLS model
par(mfrow = c(2, 2))
plotXLoadings(model)
plotXLoadings(model, comp = c(1, 2), type = 'l')
plotXYLoadings(model, comp = c(1, 2), legend.position = 'topleft')
plotRegcoeffs(model)
par(mfrow = c(1, 1))

## 7. Show predictions and residuals plots for a PLS model
par(mfrow = c(2, 2))
plotXResiduals(model, show.label = TRUE)
plotYResiduals(model, show.label = TRUE)
plotPredictions(model)
plotPredictions(model, ncomp = 4, xlab = 'C, reference', ylab = 'C, predictions')
par(mfrow = c(1, 1))

## 8. Selectivity ratio and VIP scores plots
par(mfrow = c(2, 2))
plotSelectivityRatio(model)
plotSelectivityRatio(model, ncomp = 1)
par(mfrow = c(1, 1))

## 9. Variable selection with selectivity ratio
selratio = getSelectivityRatio(model)
selvar = !(selratio < 8)

xsel = x[, selvar]
modelsel = pls(xsel, y, ncomp = 6, cv = 1)
modelsel = selectCompNum(modelsel, 3)

summary(model)
summary(modelsel)

## 10. Calculate average spectrum and show the selected variables
i = 1:ncol(x)
ms = apply(x, 2, mean)

par(mfrow = c(2, 2))

plot(i, ms, type = 'p', pch = 16, col = 'red', main = 'Original variables')
plotPredictions(model)

plot(i, ms, type = 'p', pch = 16, col = 'lightgray', main = 'Selected variables')
points(i[selvar], ms[selvar], col = 'red', pch = 16)
plotPredictions(modelsel)

par(mfrow = c(1, 1))

Run the code above in your browser using DataLab