Learn R Programming

mdatools (version 0.9.1)

pls: Partial Least Squares regression

Description

pls is used to calibrate, validate and use of partial least squares (PLS) regression model.

Usage

pls(x, y, ncomp = 15, center = T, scale = F, cv = NULL,
  exclcols = NULL, exclrows = NULL, x.test = NULL, y.test = NULL,
  method = "simpls", alpha = 0.05, coeffs.ci = NULL,
  coeffs.alpha = 0.05, info = "", light = F, ncomp.selcrit = "min")

Arguments

x

matrix with predictors.

y

matrix with responses.

ncomp

maximum number of components to calculate.

center

logical, center or not predictors and response values.

scale

logical, scale (standardize) or not predictors and response values.

cv

number of segments for cross-validation (if cv = 1, full cross-validation will be used).

exclcols

columns of x to be excluded from calculations (numbers, names or vector with logical values)

exclrows

rows to be excluded from calculations (numbers, names or vector with logical values)

x.test

matrix with predictors for test set.

y.test

matrix with responses for test set.

method

algorithm for computing PLS model (only 'simpls' is supported so far)

alpha

significance level for calculating statistical limits for residuals.

coeffs.ci

method to calculate p-values and confidence intervals for regression coefficients (so far only jack-knifing is availavle: ='jk').

coeffs.alpha

significance level for calculating confidence intervals for regression coefficients.

info

short text with information about the model.

light

run normal or light (faster) version of PLS without calculationg some performance statistics.

ncomp.selcrit

criterion for selecting optimal number of components ('min' for first local minimum of RMSECV and 'wold' for Wold's rule.)

Value

Returns an object of pls class with following fields:

ncomp

number of components included to the model.

ncomp.selected

selected (optimal) number of components.

xloadings

matrix with loading values for x decomposition.

yloadings

matrix with loading values for y decomposition.

weights

matrix with PLS weights.

selratio

array with selectivity ratio values.

vipscores

matrix with VIP scores values.

coeffs

object of class regcoeffs with regression coefficients calculated for each component.

info

information about the model, provided by user when build the model.

calres

an object of class plsres with PLS results for a calibration data.

testres

an object of class plsres with PLS results for a test data, if it was provided.

cvres

an object of class plsres with PLS results for cross-validation, if this option was chosen.

Details

So far only SIMPLS method [1] is available, more coming soon. Implementation works both with one and multiple response variables.

Like in pca, pls uses number of components (ncomp) as a minimum of number of objects - 1, number of x variables and the default or provided value. Regression coefficients, predictions and other results are calculated for each set of components from 1 to ncomp: 1, 1:2, 1:3, etc. The optimal number of components, (ncomp.selected), is found using Wold's R criterion, but can be adjusted by user using function (selectCompNum.pls). The selected optimal number of components is used for all default operations - predictions, plots, etc.

Selectivity ratio [2] and VIP scores [3] are calculated for any PLS model authomatically, however while selectivity ratio values are calculated for all computed components, the VIP scores are computed only for selected components (to save calculation time) and recalculated every time when selectCompNum() is called for the model.

Calculation of confidence intervals and p-values for regression coefficients are available only by jack-knifing so far. See help for regcoeffs objects for details.

References

1. S. de Jong, Chemometrics and Intelligent Laboratory Systems 18 (1993) 251-263. 2. Tarja Rajalahti et al. Chemometrics and Laboratory Systems, 95 (2009), 35-48. 3. Il-Gyo Chong, Chi-Hyuck Jun. Chemometrics and Laboratory Systems, 78 (2005), 103-112.

See Also

Methods for pls objects:

print prints information about a pls object.
summary.pls shows performance statistics for the model.
plot.pls shows plot overview of the model.
pls.simpls implementation of SIMPLS algorithm.
predict.pls applies PLS model to a new data.
selectCompNum.pls set number of optimal components in the model.
plotPredictions.pls shows predicted vs. measured plot.
plotRegcoeffs.pls shows regression coefficients plot.
plotXScores.pls shows scores plot for x decomposition.
plotXYScores.pls shows scores plot for x and y decomposition.
plotXLoadings.pls shows loadings plot for x decomposition.
plotXYLoadings.pls shows loadings plot for x and y decomposition.
plotRMSE.pls shows RMSE plot.
plotXVariance.pls shows explained variance plot for x decomposition.
plotYVariance.pls shows explained variance plot for y decomposition.
plotXCumVariance.pls shows cumulative explained variance plot for y decomposition.
plotYCumVariance.pls shows cumulative explained variance plot for y decomposition.
plotXResiduals.pls shows T2 vs. Q plot for x decomposition.
plotYResiduals.pls shows residuals plot for y values.
plotSelectivityRatio.pls shows plot with selectivity ratio values.
plotVIPScores.pls shows plot with VIP scores values.
getSelectivityRatio.pls returns vector with selectivity ratio values.
getVIPScores.pls returns vector with VIP scores values.
getRegcoeffs.pls returns matrix with regression coefficients.

Most of the methods for plotting data (except loadings and regression coefficients) are also available for PLS results (plsres) objects. There is also a randomization test for PLS-regression (randtest).

Examples

Run this code
# NOT RUN {
### Examples of using PLS model class
library(mdatools)   
  
## 1. Make a PLS model for concentration of first component 
## using full-cross validation and automatic detection of 
## optimal number of components and show an overview

data(simdata)
x = simdata$spectra.c
y = simdata$conc.c[, 1]

model = pls(x, y, ncomp = 8, cv = 1)
summary(model)
plot(model)

## 2. Make a PLS model for concentration of first component 
## using test set and 10 segment cross-validation and show overview

data(simdata)
x = simdata$spectra.c
y = simdata$conc.c[, 1]
x.t = simdata$spectra.t
y.t = simdata$conc.t[, 1]

model = pls(x, y, ncomp = 8, cv = 10, x.test = x.t, y.test = y.t)
model = selectCompNum(model, 2)
summary(model)
plot(model)

## 3. Make a PLS model for concentration of first component 
## using only test set validation and show overview

data(simdata)
x = simdata$spectra.c
y = simdata$conc.c[, 1]
x.t = simdata$spectra.t
y.t = simdata$conc.t[, 1]

model = pls(x, y, ncomp = 6, x.test = x.t, y.test = y.t)
model = selectCompNum(model, 2)
summary(model)
plot(model)

## 4. Show variance and error plots for a PLS model
par(mfrow = c(2, 2))
plotXCumVariance(model, type = 'h')
plotYCumVariance(model, type = 'b', show.labels = TRUE, legend.position = 'bottomright')
plotRMSE(model)
plotRMSE(model, type = 'h', show.labels = TRUE)
par(mfrow = c(1, 1))

## 5. Show scores plots for a PLS model
par(mfrow = c(2, 2))
plotXScores(model)
plotXScores(model, comp = c(1, 3), show.labels = TRUE)
plotXYScores(model)
plotXYScores(model, comp = 2, show.labels = TRUE)
par(mfrow = c(1, 1))

## 6. Show loadings and coefficients plots for a PLS model
par(mfrow = c(2, 2))
plotXLoadings(model)
plotXLoadings(model, comp = c(1, 2), type = 'l')
plotXYLoadings(model, comp = c(1, 2), legend.position = 'topleft')
plotRegcoeffs(model)
par(mfrow = c(1, 1))

## 7. Show predictions and residuals plots for a PLS model
par(mfrow = c(2, 2))
plotXResiduals(model, show.label = TRUE)
plotYResiduals(model, show.label = TRUE)
plotPredictions(model)
plotPredictions(model, ncomp = 4, xlab = 'C, reference', ylab = 'C, predictions')
par(mfrow = c(1, 1))

## 8. Selectivity ratio and VIP scores plots
par(mfrow = c(2, 2))
plotSelectivityRatio(model)
plotSelectivityRatio(model, ncomp = 1)
par(mfrow = c(1, 1))

## 9. Variable selection with selectivity ratio
selratio = getSelectivityRatio(model)
selvar = !(selratio < 8)

xsel = x[, selvar]
modelsel = pls(xsel, y, ncomp = 6, cv = 1)
modelsel = selectCompNum(modelsel, 3)

summary(model)
summary(modelsel)

## 10. Calculate average spectrum and show the selected variables
i = 1:ncol(x)
ms = apply(x, 2, mean)

par(mfrow = c(2, 2))

plot(i, ms, type = 'p', pch = 16, col = 'red', main = 'Original variables')
plotPredictions(model)

plot(i, ms, type = 'p', pch = 16, col = 'lightgray', main = 'Selected variables')
points(i[selvar], ms[selvar], col = 'red', pch = 16)
plotPredictions(modelsel)

par(mfrow = c(1, 1))

# }

Run the code above in your browser using DataLab