dbplsr
is a variety of partial least squares regression
where explanatory information is coded as distances between individuals.
These distances can either be computed from observed explanatory variables
or directly input as a squared distances matrix.
Since distances can be computed from a mixture of continuous and
qualitative explanatory variables or, in fact, from more general
quantities, dbplsr
is a proper extension of plsr
.
Notation convention: in distance-based methods we must distinguish observed explanatory variables which we denote by Z or z, from Euclidean coordinates which we denote by X or x. For explanation on the meaning of both terms see the bibliography references below.
# S3 method for formula
dbplsr(formula,data,...,metric="euclidean",
method="ncomp",weights,ncomp) # S3 method for dist
dbplsr(distance,y,...,weights,ncomp=ncomp,method="ncomp")
# S3 method for D2
dbplsr(D2,y,...,weights,ncomp=ncomp,method="ncomp")
# S3 method for Gram
dbplsr(G,y,...,weights,ncomp=ncomp,method="ncomp")
A list of class dbplsr
containing the following components:
a list containing the residuals (response minus fitted values) for each iteration.
a list containing the fitted values for each iteration.
a list containing the scores for each iteration.
regression coefficients. fitted.values = fk*bk
orthogonal projector on the one-dimensional linear space by fk
.
number of components included in the model.
optimum number of components according to the selected method.
the specified weights.
the using method.
the response used to fit the model.
the hat matrix projector.
initial weighted centered inner products matrix of the squared distance matrix.
weighted centered inner products matrix in last iteration.
total weighted geometric variability.
the diagonal entries in G0
.
geometric variability for each iteration.
the ordinary cross-validation estimate of the prediction error.
the generalized cross-validation estimate of the prediction error.
the Akaike Value Criterium of the model.
the Bayesian Value Criterium of the model.
an object of class formula
. A formula of the form y~Z
.
This argument is a remnant of the plsr
function,
kept for compatibility.
an optional data frame containing the variables in the model (both response and explanatory variables, either the observed ones, Z, or a Euclidean configuration X).
(required if no formula is given as the principal argument). Response (dependent variable) must be numeric, matrix or data.frame.
a dist
or dissimilarity
class object. See functions
dist
in the package stats
and daisy
in the package cluster
.
a D2
class object. Squared distances matrix between individuals.
a Gram
class object. Weighted centered inner products matrix of the
squared distances matrix D2
.
See details in dblm
.
metric function to be used when computing distances from observed
explanatory variables.
One of "euclidean"
(default), "manhattan"
,
or "gower"
.
sets the method to be used in deciding how many components needed to fit
the best model for new predictions.
There are five different methods, "AIC"
, "BIC"
, "OCV"
,
"GCV"
and "ncomp"
(default).
OCV
and GCV
find the number of components that minimizes
the Cross-validation coefficient (ocv
or gcv
).
AIC
and BIC
find the number of components that minimizes
the Akaike or Bayesian Information Criterion (see AIC
for more details).
an optional numeric vector of weights to be used in the fitting process. By default all individuals have the same weight.
the number of components to include in the model.
arguments passed to or from other methods to the low level.
Boj, Eva <evaboj@ub.edu>, Caballe, Adria <adria.caballe@upc.edu>, Delicado, Pedro <pedro.delicado@upc.edu> and Fortiana, Josep <fortiana@ub.edu>
Partial least squares (PLS) is a method for constructing
predictive models when the factors (Z) are many and highly collinear.
A PLS model will try to find the multidimensional direction
in the Z space that explains the maximum multidimensional variance direction
in the Y space. dbplsr
is particularly suited when the matrix of
predictors has more variables than observations.
By contrast, standard regression (dblm
) will fail in these cases.
The various possible ways for inputting the model explanatory
information through distances, or their squares, etc., are the
same as in dblm
.
The number of components to fit is specified with the argument ncomp
.
Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Implementing PLS for distance-based regression: computational issues. Computational Statistics 22, 237-248.
Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.
Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.
Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.
Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.
summary.dbplsr
for summary.
plot.dbplsr
for plots.
predict.dbplsr
for predictions.
#require(pls)
library(pls)
data(yarn)
## Default methods:
yarn.dbplsr <- dbplsr(density ~ NIR, data = yarn, ncomp=6, method="GCV")
Run the code above in your browser using DataLab