mvr
Partial Least Squares and Principal Component Regression
Functions to perform partial least squares regression (PLSR), canonical powered partial least squares (CPPLS) or principal component regression (PCR), with a formula interface. Crossvalidation can be used. Prediction, model extraction, plot, print and summary methods exist.
 Keywords
 multivariate, regression
Usage
mvr(formula, ncomp, Y.add, data, subset, na.action, method = pls.options()$mvralg, scale = FALSE, validation = c("none", "CV", "LOO"), model = TRUE, x = FALSE, y = FALSE, ...)
plsr(..., method = pls.options()$plsralg)
cppls(..., Y.add, weights, method = pls.options()$cpplsalg)
pcr(..., method = pls.options()$pcralg)
Arguments
 formula
 a model formula. Most of the
lm
formula constructs are supported. See below.  ncomp
 the number of components to include in the model (see below).
 Y.add
 a vector or matrix of additional responses containing
relevant information about the observations. Only used for
cppls
.  data
 an optional data frame with the data to fit the model from.
 subset
 an optional vector specifying a subset of observations to be used in the fitting process.
 na.action
 a function which indicates what should happen when
the data contain missing values. The default is set by
the
na.action
setting ofoptions
, and isna.fail
if that is unset. The ‘factoryfresh’ default isna.omit
. Another possible value isNULL
, no action. Valuena.exclude
can be useful. Seena.omit
for other alternatives.  method
 the multivariate regression method to be used. If
"model.frame"
, the model frame is returned.  scale
 numeric vector, or logical. If numeric vector, $X$
is scaled by dividing each variable with the corresponding element
of
scale
. Ifscale
isTRUE
, $X$ is scaled by dividing each variable by its sample standard deviation. If crossvalidation is selected, scaling by the standard deviation is done for every segment.  validation
 character. What kind of (internal) validation to use. See below.
 model
 a logical. If
TRUE
, the model frame is returned.  x
 a logical. If
TRUE
, the model matrix is returned.  y
 a logical. If
TRUE
, the response is returned.  weights
 a vector of individual weights for the observations.
Only used for
cppls
. (Optional)  ...
 additional arguments, passed to the underlying fit
functions, and
mvrCv
.
Details
The functions fit PLSR, CPPLS or PCR models with 1, $\ldots$,
ncomp
number of components. Multiresponse models are fully
supported.
The type of model to fit is specified with the method
argument. Four PLSR algorithms are available: the kernel algorithm
("kernelpls"
), the wide kernel algorithm ("widekernelpls"
),
SIMPLS ("simpls"
) and the classical orthogonal scores algorithm
("oscorespls"
). One CPPLS algorithm is available ("cppls"
)
providing several extensions to PLS. One PCR algorithm
is available: using the singular value decomposition ("svdpc"
).
If method
is "model.frame"
, the model frame is returned.
The functions pcr
, plsr
and cppls
are wrappers for mvr
, with different values for method
.
The formula
argument should be a symbolic formula of the form
response ~ terms
, where response
is the name of the
response vector or matrix (for multiresponse models) and terms
is the name of one or more predictor matrices, usually separated by
+
, e.g., water ~ FTIR
or y ~ X + Z
. See
lm
for a detailed description. The named
variables should exist in the supplied data
data frame or in
the global environment. Note: Do not use mvr(mydata$y ~
mydata$X, ...)
, instead use mvr(y ~ X, data = mydata,
...)
. Otherwise, predict.mvr
will not work properly.
The chapter Statistical models in R of the manual An
Introduction to R distributed with R is a good reference on
formulas in R.
The number of components to fit is specified with the argument
ncomp
. It this is not supplied, the maximal number of
components is used (taking account of any crossvalidation).
If validation = "CV"
, crossvalidation is performed. The number and
type of crossvalidation segments are specified with the arguments
segments
and segment.type
. See mvrCv
for
details. If validation = "LOO"
, leaveoneout crossvalidation
is performed. It is an error to specify the segments when
validation = "LOO"
is specified.
By default, the crossvalidation will be performed serially. However,
it can be done in parallel using functionality in the
parallel
package by setting the option parallel
in
pls.options
. See pls.options
for the
differnt ways to specify the parallelism. See also Examples below.
Note that the crossvalidation is optimised for speed, and some
generality has been sacrificed. Especially, the model matrix is
calculated only once for the complete crossvalidation, so models like
y ~ msc(X)
will not be properly crossvalidated. However,
scaling requested by scale = TRUE
is properly crossvalidated.
For proper crossvalidation of models where the model matrix must be
updated/regenerated for each segment, use the separate function
crossval
.
Value

If
 validation
 if validation was requested, the results of the
crossvalidation. See
mvrCv
for details.  fit.time
 the elapsed time for the fit. This is used by
crossval
to decide whether to turn on tracing.  na.action
 if observations with missing values were removed,
na.action
contains a vector with their indices. The class of this vector is used by functions likefitted
to decide how to treat the observations.  ncomp
 the number of components of the model.
 method
 the method used to fit the model. See the argument
method
for possible values.  scale
 if scaling was requested (with
scale
), the scaling used.  call
 the function call.
 terms
 the model terms.
 model
 if
model = TRUE
, the model frame.  x
 if
x = TRUE
, the model matrix.  y
 if
y = TRUE
, the model response.
method = "model.frame"
, the model frame is returned.
Otherwise, an object of class mvr
is returned.
The object contains all components returned by the underlying fit
function. In addition, it contains the following components:
References
Martens, H., Næs, T. (1989) Multivariate calibration. Chichester: Wiley.
See Also
kernelpls.fit
,
widekernelpls.fit
,
simpls.fit
,
oscorespls.fit
,
cppls.fit
,
svdpc.fit
,
mvrCv
,
crossval
,
loadings
,
scores
,
loading.weights
,
coef.mvr
,
predict.mvr
,
R2
,
MSEP
,
RMSEP
,
plot.mvr
Examples
data(yarn)
## Default methods:
yarn.pcr < pcr(density ~ NIR, 6, data = yarn, validation = "CV")
yarn.pls < plsr(density ~ NIR, 6, data = yarn, validation = "CV")
yarn.cppls < cppls(density ~ NIR, 6, data = yarn, validation = "CV")
## Alternative methods:
yarn.oscorespls < mvr(density ~ NIR, 6, data = yarn, validation = "CV",
method = "oscorespls")
yarn.simpls < mvr(density ~ NIR, 6, data = yarn, validation = "CV",
method = "simpls")
## Not run:
# ## Parallelised crossvalidation, using transient cluster:
# pls.options(parallel = 4) # use mclapply
# pls.options(parallel = quote(makeCluster(4, type = "PSOCK"))) # use parLapply
# ## A new cluster is created and stopped for each crossvalidation:
# yarn.pls < plsr(density ~ NIR, 6, data = yarn, validation = "CV")
# yarn.pcr < pcr(density ~ NIR, 6, data = yarn, validation = "CV")
#
# ## Parallelised crossvalidation, using persistent cluster:
# library(parallel)
# ## This creates the cluster:
# pls.options(parallel = makeCluster(4, type = "PSOCK"))
# ## The cluster can be used several times:
# yarn.pls < plsr(density ~ NIR, 6, data = yarn, validation = "CV")
# yarn.pcr < pcr(density ~ NIR, 6, data = yarn, validation = "CV")
# ## The cluster should be stopped manually afterwards:
# stopCluster(pls.options()$parallel)
#
# ## Parallelised crossvalidation, using persistent MPI cluster:
# ## This requires the packages snow and Rmpi to be installed
# library(parallel)
# ## This creates the cluster:
# pls.options(parallel = makeCluster(4, type = "MPI"))
# ## The cluster can be used several times:
# yarn.pls < plsr(density ~ NIR, 6, data = yarn, validation = "CV")
# yarn.pcr < pcr(density ~ NIR, 6, data = yarn, validation = "CV")
# ## The cluster should be stopped manually afterwards:
# stopCluster(pls.options()$parallel)
# ## It is good practice to call mpi.exit() or mpi.quit() afterwards:
# mpi.exit()
# ## End(Not run)
## Multiresponse models:
data(oliveoil)
sens.pcr < pcr(sensory ~ chemical, ncomp = 4, scale = TRUE, data = oliveoil)
sens.pls < plsr(sensory ~ chemical, ncomp = 4, scale = TRUE, data = oliveoil)
## Classification
# A classification example utilizing additional response information
# (Y.add) is found in the cppls.fit manual ('See also' above).