This is a wrapper to integrate the different dissimilarity functions of the offered by package.It computes the dissimilarities between observations in numerical matrices by using an specifed dissmilarity measure.
dissimilarity(Xr, Xu = NULL,
diss_method = c("pca", "pca.nipals", "pls", "mpls",
"cor", "euclid", "cosine", "sid"),
Yr = NULL, gh = FALSE, pc_selection = list("var", 0.01),
return_projection = FALSE, ws = NULL,
center = TRUE, scale = FALSE, documentation = character(),
...)
A list with the following components:
dissimilarity
: the resulting dissimilarity matrix.
projection
: an ortho_projection
object. Only output
if return_projection = TRUE
and if diss_method = "pca"
,
diss_method = "pca.nipals"
, diss_method = "pls"
or
diss_method = "mpls"
.
This object contains the projection used to compute
the dissimilarity matrix. In case of local dissimilarity matrices,
the projection corresponds to the global projection used to select the
neighborhoods (see ortho_diss
function for further
details).
gh
: a list containing the GH distances as well as the
pls projection used to compute the GH.
a matrix of containing n
observations/rows and p
variables/columns.
an optional matrix containing data of a second set of observations
with p
variables/columns.
a character string indicating the method to be used to compute the dissimilarities between observations. Options are:
"pca"
: Mahalanobis distance
computed on the matrix of scores of a Principal Component (PC)
projection of Xr
(and Xu
if provided). PC projection is
done using the singular value decomposition (SVD) algorithm.
See ortho_diss
function.
"pca.nipals"
: Mahalanobis distance
computed on the matrix of scores of a Principal Component (PC)
projection of Xr
(and Xu
if provided). PC projection is
done using the non-linear iterative partial least squares (nipals)
algorithm. See ortho_diss
function.
"pls"
: Mahalanobis distance
computed on the matrix of scores of a partial least squares projection
of Xr
(and Xu
if provided). In this case, Yr
is
always required. See ortho_diss
function.
"mpls"
: Mahalanobis distance
computed on the matrix of scores of a modified partial least squares
projection (Shenk and Westerhaus, 1991; Westerhaus, 2014)
of Xr
(and Xu
if provided). In this case, Yr
is
always required. See ortho_diss
function.
"cor"
: based on the correlation coefficient
between observations. See cor_diss
function.
"euclid"
: Euclidean distance
between observations. See f_diss
function.
"cosine"
: Cosine distance
between observations. See f_diss
function.
"sid"
: spectral information divergence between
observations. See sid
function.
a numeric matrix of n
observations used as side information of
Xr
for the ortho_diss
methods (i.e. pca
,
pca.nipals
or pls
). It is required when:
diss_method = "pls"
diss_method = "pca"
with "opc"
used as the method
in the pc_selection
argument. See ortho_diss.
gh = TRUE
a logical indicating if the Mahalanobis distance (in the pls score space) between each observation and the pls centre/mean must be computed.
a list of length 2 to be passed onto the
ortho_diss
methods. It is required if the method selected in
diss_method
is any of "pca"
, "pca.nipals"
or
"pls"
or if gh = TRUE
. This argument is used for
optimizing the number of components (principal components or pls factors)
to be retained. This list must contain two elements in the following order:
method
(a character indicating the method for selecting the number of
components) and value
(a numerical value that complements the selected
method). The methods available are:
"opc"
: optimized principal component selection based on
Ramirez-Lopez et al. (2013a, 2013b). The optimal number of components
(of set of observations) is the one for which its distance matrix
minimizes the differences between the Yr
value of each
observation and the Yr
value of its closest observation. In this
case value
must be a value ((larger than 0 and
below the minimum dimension of Xr
or Xr
and Xu
combined) indicating the maximum
number of principal components to be tested. See the
ortho_projection
function for more details.
"cumvar"
: selection of the principal components based
on a given cumulative amount of explained variance. In this case,
value
must be a value (larger than 0 and below or equal to 1)
indicating the minimum amount of cumulative variance that the
combination of retained components should explain.
"var"
: selection of the principal components based
on a given amount of explained variance. In this case,
value
must be a value (larger than 0 and below or equal to 1)
indicating the minimum amount of variance that a single component
should explain in order to be retained.
"manual"
: for manually specifying a fix number of
principal components. In this case, value
must be a value
(larger than 0 and
below the minimum dimension of Xr
or Xr
and Xu
combined).
indicating the minimum amount of variance that a component should
explain in order to be retained.
The default is list(method = "var", value = 0.01)
.
Optionally, the pc_selection
argument admits "opc"
or
"cumvar"
or "var"
or "manual"
as a single character
string. In such a case the default "value"
when either "opc"
or
"manual"
are used is 40. When "cumvar"
is used the default
"value"
is set to 0.99 and when "var"
is used, the default
"value"
is set to 0.01.
a logical indicating if the projection(s) must be
returned. Projections are used if the ortho_diss
methods are
called (i.e. diss_method = "pca"
, diss_method = "pca.nipals"
or
diss_method = "pls"
) or when gh = TRUE
.
In case gh = TRUE
and a ortho_diss
method is used (in the
diss_method
argument), both projections are returned.
an odd integer value which specifies the window size, when
diss_method = "cor"
(cor_diss
method) for moving
correlation dissimilarity. If ws = NULL
(default), then the window
size will be equal to the number of variables (columns), i.e. instead moving
correlation, the normal correlation will be used. See cor_diss
function.
a logical indicating if Xr
(and Xu
if provided)
must be centered. If Xu
is provided the data is centered around the
mean of the pooled Xr
and Xu
matrices (Xr XuXr U Xu). For
dissimilarity computations based on diss_method = pls
, the data is
always centered.
a logical indicating if Xr
(and Xu
if
provided) must be scaled. If Xu
is provided the data is scaled based
on the standard deviation of the the pooled Xr
and Xu
matrices
(Xr XuXr U Xu). If center = TRUE
, scaling is applied after
centering.
an optional character string that can be used to
describe anything related to the mbl
call (e.g. description of the
input data). Default: character()
. NOTE: his is an experimental
argument.
other arguments passed to the dissimilarity functions
(ortho_diss
, cor_diss
, f_diss
or
sid
).
This function is a wrapper for ortho_diss
, cor_diss
,
f_diss
, sid
. Check the documentation of these
functions for further details.
Shenk, J., Westerhaus, M., and Berzaghi, P. 1997. Investigation of a LOCAL calibration procedure for near infrared instruments. Journal of Near Infrared Spectroscopy, 5, 223-232.
Westerhaus, M. 2014. Eastern Analytical Symposium Award for outstanding Wachievements in near infrared spectroscopy: my contributions to Wnear infrared spectroscopy. NIR news, 25(8), 16-20.
ortho_diss
cor_diss
f_diss
sid
.
library(prospectr)
data(NIRsoil)
# Filter the data using the first derivative with Savitzky and Golay
# smoothing filter and a window size of 11 spectral variables and a
# polynomial order of 4
sg <- savitzkyGolay(NIRsoil$spc, m = 1, p = 4, w = 15)
# Replace the original spectra with the filtered ones
NIRsoil$spc <- sg
Xu <- NIRsoil$spc[!as.logical(NIRsoil$train), ]
Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]
Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]
Xu <- Xu[!is.na(Yu), ]
Xr <- Xr[!is.na(Yr), ]
Yu <- Yu[!is.na(Yu)]
Yr <- Yr[!is.na(Yr)]
dsm_pca <- dissimilarity(
Xr = Xr, Xu = Xu,
diss_method = c("pca"),
Yr = Yr, gh = TRUE,
pc_selection = list("opc", 30),
return_projection = TRUE
)
Run the code above in your browser using DataLab