Partial Least Squares and Principal Component Regression
Functions to perform partial least squares regression (PLSR) or principal component regression (PCR), with a formula interface. Cross-validation can be used. Prediction, model extraction, plot, print and summary methods exist.
mvr(formula, ncomp, data, subset, na.action, method = c("kernelpls", "simpls", "oscorespls", "svdpc", "model.frame"), scale = FALSE, validation = c("none", "CV", "LOO"), model = TRUE, x = FALSE, y = FALSE, ...) plsr(..., method = c("kernelpls", "simpls", "oscorespls", "model.frame")) pcr(..., method = c("svdpc", "model.frame"))
- a model formula. Most of the
lmformula constructs are supported. See below.
- the number of components to include in the model (see below).
- an optional data frame with the data to fit the model from.
- an optional vector specifying a subset of observations to be used in the fitting process.
- a function which indicates what should happen when the data contain missing values.
- the multivariate regression method to be used. If
"model.frame", the model frame is returned.
- numeric vector, or logical. If numeric vector, $X$
is scaled by dividing each variable with the corresponding element
TRUE, $X$ is scaled by dividing each variable by its sample st
- character. What kind of (internal) validation to use. See below.
- a logical. If
TRUE, the model frame is returned.
- a logical. If
TRUE, the model matrix is returned.
- a logical. If
TRUE, the response is returned.
- additional arguments, passed to the underlying fit
The functions fit PLSR or PCR models with 1, $\ldots$,
ncomp number of components. Multi-response models are fully
Three PLSR algorithms are available: the kernel algorithm, SIMPLS and
the classical orthogonal scores algorithm. One PCR algorithm is
available: using the singular value decomposition. The type of
regression is specified with the
are wrappers for
mvr, with different values for
formula argument should be a symbolic formula of the form
response ~ terms, where
response is the name of the
response vector or matrix (for multi-response models) and
is the name of one or more predictor matrices, usually separated by
water ~ FTIR or
y ~ X + Z. See
lm for a detailed description. The named
variables should exist in the supplied
data data frame or in
the global environment. Note: Do not use
mydata$X, ...), instead use
mvr(y ~ X, data = mydata,
predict.mvr will not work properly.
The chapter Statistical models in R of the manual An
Introduction to R distributed with Ris a good reference on
formulas in R.
The number of components to fit is specified with the argument
ncomp. It this is not supplied, the maximal number of
components is used (taking account of any cross-validation).
validation = "CV", cross-validation is performed. The number and
type of cross-validation segments are specified with the arguments
validation = "LOO", leave-one-out cross-validation
is performed. It is an error to specify the segments when
validation = "LOO" is specified.
Note that the cross-validation is optimised for speed, and some
generality has been sacrificed. Especially, the model matrix is
calculated only once for the complete cross-validation, so models like
y ~ msc(X) will not be properly cross-validated. However,
scaling requested by
scale = TRUE is properly cross-validated.
For proper cross-validation of models where the model matrix must be
updated/regenerated for each segment, use the separate function
method = "model.frame", the model frame is returned. Otherwise, an object of class
mvris returned. The object contains all components returned by the underlying fit function. In addition, it contains the following components:
validation if validation was requested, the results of the cross-validation. See
na.action if observations with missing values were removed,
na.actioncontains a vector with their indices. The class of this vector is used by functions like
fittedto decide how to treat the observations.
ncomp the number of components of the model. method the method used to fit the model. See the argument
methodfor possible values.
scale if scaling was requested (with
scale), the scaling used.
call the function call. terms the model terms. model if
model = TRUE, the model frame.
x = TRUE, the model matrix.
y = TRUE, the model response.
Martens, H., N�s, T. (1989) Multivariate calibration. Chichester: Wiley.
data(NIR) ## Default methods: NIR.pcr <- pcr(y ~ X, 6, data = NIR, validation = "CV") NIR.pls <- plsr(y ~ X, 6, data = NIR, validation = "CV") ## Alternative methods: NIR.oscorespls <- mvr(y ~ X, 6, data = NIR, validation = "CV", method = "oscorespls") NIR.simpls <- mvr(y ~ X, 6, data = NIR, validation = "CV", method = "simpls") data(sensory) Pn <- scale(sensory$Panel) Ql <- scale(sensory$Quality) sens.pcr <- pcr(Ql ~ Pn, ncomp = 4) sens.pls <- plsr(Ql ~ Pn, ncomp = 4)