# mvr

##### Partial Least Squares and Principal Component Regression

Functions to perform partial least squares regression (PLSR), canonical powered partial least squares (CPPLS) or principal component regression (PCR), with a formula interface. Cross-validation can be used. Prediction, model extraction, plot, print and summary methods exist.

- Keywords
- multivariate, regression

##### Usage

```
mvr(formula, ncomp, Y.add, data, subset, na.action,
method = pls.options()$mvralg,
scale = FALSE, center = TRUE, validation = c("none", "CV", "LOO"),
model = TRUE, x = FALSE, y = FALSE, …)
plsr(…, method = pls.options()$plsralg)
cppls(…, Y.add, weights, method = pls.options()$cpplsalg)
pcr(…, method = pls.options()$pcralg)
```

##### Arguments

- formula
a model formula. Most of the

`lm`

formula constructs are supported. See below.- ncomp
the number of components to include in the model (see below).

- Y.add
a vector or matrix of additional responses containing relevant information about the observations. Only used for

`cppls`

.- data
an optional data frame with the data to fit the model from.

- subset
an optional vector specifying a subset of observations to be used in the fitting process.

- na.action
a function which indicates what should happen when the data contain missing values. The default is set by the

`na.action`

setting of`options`

, and is`na.fail`

if that is unset. The ‘factory-fresh’ default is`na.omit`

. Another possible value is`NULL`

, no action. Value`na.exclude`

can be useful. See`na.omit`

for other alternatives.- method
the multivariate regression method to be used. If

`"model.frame"`

, the model frame is returned.- scale
numeric vector, or logical. If numeric vector, \(X\) is scaled by dividing each variable with the corresponding element of

`scale`

. If`scale`

is`TRUE`

, \(X\) is scaled by dividing each variable by its sample standard deviation. If cross-validation is selected, scaling by the standard deviation is done for every segment.- center
logical, determines if the \(X\) and \(Y\) matrices are mean centered or not. Default is to perform mean centering.

- validation
character. What kind of (internal) validation to use. See below.

- model
a logical. If

`TRUE`

, the model frame is returned.- x
a logical. If

`TRUE`

, the model matrix is returned.- y
a logical. If

`TRUE`

, the response is returned.- weights
a vector of individual weights for the observations. Only used for

`cppls`

. (Optional)- …
additional optional arguments, passed to the underlying fit functions, and

`mvrCv`

.Currently, the fit functions

`oscorespls.fit`

and`widekernelpls.fit`

implement these extra arguments:- tol:
numeric. Tolerance used for determining convergence.

- maxit:
positive integer. The maximal number of iterations used.

and

`cppls.fit`

implements:- lower:
a vector of lower limits for power optimisation.

- upper:
a vector of upper limits for power optimisation.

- trunc.pow:
logical. Whether to use an experimental alternative power algorithm.

`mvrCv`

implements several arguments; the following are probably the most useful of them:- segments:
the number of segments to use, or a list with segments.

- segment.type:
the type of segments to use.

- length.seg:
Positive integer. The length of the segments to use.

- jackknife:
logical. Whether to perform jackknifing of regression coefficients.

See the functions' documentation for details.

##### Details

The functions fit PLSR, CPPLS or PCR models with 1, \(\ldots\),
`ncomp`

number of components. Multi-response models are fully
supported.

The type of model to fit is specified with the `method`

argument. Four PLSR algorithms are available: the kernel algorithm
(`"kernelpls"`

), the wide kernel algorithm (`"widekernelpls"`

),
SIMPLS (`"simpls"`

) and the classical orthogonal scores algorithm
(`"oscorespls"`

). One CPPLS algorithm is available (`"cppls"`

)
providing several extensions to PLS. One PCR algorithm
is available: using the singular value decomposition (`"svdpc"`

).
If `method`

is `"model.frame"`

, the model frame is returned.
The functions `pcr`

, `plsr`

and `cppls`

are wrappers for `mvr`

, with different values for `method`

.

The `formula`

argument should be a symbolic formula of the form
`response ~ terms`

, where `response`

is the name of the
response vector or matrix (for multi-response models) and `terms`

is the name of one or more predictor matrices, usually separated by
`+`

, e.g., `water ~ FTIR`

or `y ~ X + Z`

. See
`lm`

for a detailed description. The named
variables should exist in the supplied `data`

data frame or in
the global environment. Note: Do not use ```
mvr(mydata$y ~
mydata$X, …)
```

, instead use ```
mvr(y ~ X, data = mydata,
…)
```

. Otherwise, `predict.mvr`

will not work properly.
The chapter `Statistical models in R` of the manual `An
Introduction to R` distributed with R is a good reference on
formulas in R.

The number of components to fit is specified with the argument
`ncomp`

. It this is not supplied, the maximal number of
components is used (taking account of any cross-validation).

All implemented algorithms mean-center both predictor and response
matrices. This can be turned off by specifying `center = FALSE`

.
See Seasholtz and Kowalski for a discussion about centering in PLS
regression.

If `validation = "CV"`

, cross-validation is performed. The number and
type of cross-validation segments are specified with the arguments
`segments`

and `segment.type`

. See `mvrCv`

for
details. If `validation = "LOO"`

, leave-one-out cross-validation
is performed. It is an error to specify the segments when
`validation = "LOO"`

is specified.

By default, the cross-validation will be performed serially. However,
it can be done in parallel using functionality in the
`parallel`

package by setting the option `parallel`

in
`pls.options`

. See `pls.options`

for the
differnt ways to specify the parallelism. See also Examples below.

Note that the cross-validation is optimised for speed, and some
generality has been sacrificed. Especially, the model matrix is
calculated only once for the complete cross-validation, so models like
`y ~ msc(X)`

will not be properly cross-validated. However,
scaling requested by `scale = TRUE`

is properly cross-validated.
For proper cross-validation of models where the model matrix must be
updated/regenerated for each segment, use the separate function
`crossval`

.

##### Value

If `method = "model.frame"`

, the model frame is returned.
Otherwise, an object of class `mvr`

is returned.
The object contains all components returned by the underlying fit
function. In addition, it contains the following components:

if validation was requested, the results of the
cross-validation. See `mvrCv`

for details.

the elapsed time for the fit. This is used by
`crossval`

to decide whether to turn on tracing.

if observations with missing values were removed,
`na.action`

contains a vector with their indices. The
class of this vector is used by functions like `fitted`

to
decide how to treat the observations.

the number of components of the model.

the method used to fit the model. See the argument
`method`

for possible values.

if scaling was requested (with `scale`

), the
scaling used.

the function call.

the model terms.

if `model = TRUE`

, the model frame.

if `x = TRUE`

, the model matrix.

if `y = TRUE`

, the model response.

##### References

Martens, H., N<U+00E6>s, T. (1989) *Multivariate calibration.*
Chichester: Wiley.

Seasholtz, M. B. and Kowalski, B. R. (1992) The effect of mean
centering on prediction in multivariate calibration. *Journal of
Chemometrics*, **6**(2), 103--111.

##### See Also

`kernelpls.fit`

,
`widekernelpls.fit`

,
`simpls.fit`

,
`oscorespls.fit`

,
`cppls.fit`

,
`svdpc.fit`

,
`mvrCv`

,
`crossval`

,
`loadings`

,
`scores`

,
`loading.weights`

,
`coef.mvr`

,
`predict.mvr`

,
`R2`

,
`MSEP`

,
`RMSEP`

,
`plot.mvr`

##### Examples

```
# NOT RUN {
data(yarn)
## Default methods:
yarn.pcr <- pcr(density ~ NIR, 6, data = yarn, validation = "CV")
yarn.pls <- plsr(density ~ NIR, 6, data = yarn, validation = "CV")
yarn.cppls <- cppls(density ~ NIR, 6, data = yarn, validation = "CV")
## Alternative methods:
yarn.oscorespls <- mvr(density ~ NIR, 6, data = yarn, validation = "CV",
method = "oscorespls")
yarn.simpls <- mvr(density ~ NIR, 6, data = yarn, validation = "CV",
method = "simpls")
# }
# NOT RUN {
## Parallelised cross-validation, using transient cluster:
pls.options(parallel = 4) # use mclapply
pls.options(parallel = quote(makeCluster(4, type = "PSOCK"))) # use parLapply
## A new cluster is created and stopped for each cross-validation:
yarn.pls <- plsr(density ~ NIR, 6, data = yarn, validation = "CV")
yarn.pcr <- pcr(density ~ NIR, 6, data = yarn, validation = "CV")
## Parallelised cross-validation, using persistent cluster:
library(parallel)
## This creates the cluster:
pls.options(parallel = makeCluster(4, type = "PSOCK"))
## The cluster can be used several times:
yarn.pls <- plsr(density ~ NIR, 6, data = yarn, validation = "CV")
yarn.pcr <- pcr(density ~ NIR, 6, data = yarn, validation = "CV")
## The cluster should be stopped manually afterwards:
stopCluster(pls.options()$parallel)
## Parallelised cross-validation, using persistent MPI cluster:
## This requires the packages snow and Rmpi to be installed
library(parallel)
## This creates the cluster:
pls.options(parallel = makeCluster(4, type = "MPI"))
## The cluster can be used several times:
yarn.pls <- plsr(density ~ NIR, 6, data = yarn, validation = "CV")
yarn.pcr <- pcr(density ~ NIR, 6, data = yarn, validation = "CV")
## The cluster should be stopped manually afterwards:
stopCluster(pls.options()$parallel)
## It is good practice to call mpi.exit() or mpi.quit() afterwards:
mpi.exit()
# }
# NOT RUN {
## Multi-response models:
data(oliveoil)
sens.pcr <- pcr(sensory ~ chemical, ncomp = 4, scale = TRUE, data = oliveoil)
sens.pls <- plsr(sensory ~ chemical, ncomp = 4, scale = TRUE, data = oliveoil)
## Classification
# A classification example utilizing additional response information
# (Y.add) is found in the cppls.fit manual ('See also' above).
# }
```

*Documentation reproduced from package pls, version 2.7-2, License: GPL-2*