resemble (version 1.2.2)

orthoProjection: Orthogonal projections using partial least squares and principal component analysis

Description

Functions to perform orthogonal projections of high dimensional data matrices using partial least squares (pls) and principal component analysis (pca)

Usage

orthoProjection(Xr, X2 = NULL, 
                Yr = NULL, 
                method = "pca", pcSelection = list("cumvar", 0.99), 
                center = TRUE, scaled = FALSE, cores = 1, ...)
                
pcProjection(Xr, X2 = NULL, Yr = NULL, 
             pcSelection = list("cumvar", 0.99), 
             center = TRUE, scaled = FALSE, 
             method = "pca",
             tol = 1e-6, max.iter = 1000, 
             cores = 1, ...)  
              
plsProjection(Xr, X2 = NULL, Yr, 
              pcSelection = list("opc", 40), 
              scaled = FALSE, 
              tol = 1e-6, max.iter = 1000, 
              cores = 1, ...) 
              
## S3 method for class 'orthoProjection':
predict(object, newdata, ...)

pcProjection(Xr, X2 = NULL, Yr = NULL, pcSelection = list("cumvar", 0.99),
  center = TRUE, scaled = FALSE, method = "pca", tol = 1e-06,
  max.iter = 1000, cores = 1, ...)

plsProjection(Xr, X2 = NULL, Yr, pcSelection = list("opc", 40),
  scaled = FALSE, tol = 1e-06, max.iter = 1000, cores = 1, ...)

## S3 method for class 'orthoProjection':
predict(object, newdata, ...)

Arguments

Xr
a matrix (or data.frame) containing the (reference) data.
X2
an optional matrix (or data.frame) containing data of a second set of observations(samples).
Yr
if the method used in the pcSelection argument is "opc" or if the sm argument is either "pls" or "loc.pls", then it must be a vector containing the side information correspondin
method
the method for projecting the data. Options are: "pca" (principal component analysis using the singular value decomposition algorithm), "pca.nipals" (principal component analysis using the non-linear iterative partial least squares algorithm) and "pls" (p
pcSelection
a list which specifies the method to be used for identifying the number of principal components to be retained for computing the Mahalanobis distance of each sample in sm = "Xu" to the centre of sm = "Xr". It also specifies the n

Value

  • orthoProjection, pcProjection, plsProjection, return a list of class orthoProjection with the following components:
    • scores
    { a matrix of scores corresponding to the samples in Xr and X2 (if it applies). The number of components that the scores represent is given by the number of components chosen in the function.}
  • X.loadingsa matrix of loadings corresponding to the explanatory variables. The number of components that these loadings represent is given by the number of components chosen in the function.
  • Y.loadingsa matrix of partial least squares loadings corresponding to Yr. The number of components that these loadings represent is given by the number of components chosen in the function. This object is only returned if the partial least squares algorithm was used.
  • weigthsa matrix of partial least squares ("pls") weights. This object is only returned if the "pls" algorithm was used.
  • projectionMa matrix that can be used to project new data onto a "pls" space. This object is only returned if the "pls" algorithm was used.
  • variancea matrix indicating the standard deviation of each component (sdv), the cumulative explained variance (cumExplVar) and the variance explained by each single component (explVar). These values are computed based on the data used to create the projection matrices. For example if the "pls" method was used, then these values are computed based only on the data that contains information on Yr (i.e. the Xr data) If the principal component method is used, the this data is computed on the basis of Xr and X2 (if it applies) since both matrices are employed in the computation of the projection matrix (loadings in this case)
  • .
  • svdthe standard deviation of the retrieved scores.
  • n.componentsthe number of components (either principal components or partial least squares components) used for computing the global distances.
  • opcEvala data.frame containing the statistics computed for optimizing the number of principal components based on the variable(s) specified in the Yr argument. If Yr was a continuous was a continuous vector or matrix then this object indicates the root mean square of differences (rmse) for each number of components. If Yr was a categorical variable this object indicates the kappa values for each number of components. This object is returned only if "opc" was used within the pcSelection argument. See the simEval function for more details.
  • methodthe orthoProjection method used.

item

  • center
  • scaled
  • cores
  • ...
  • tol
  • max.iter
  • object
  • newdata

code

newdtata

eqn

$Xr \cup Xu$

Details

In the case of method = "pca", the algrithm used is the singular value decomposition in which given a data matrix $X$, is factorized as follows: $$X = UDV^{\mathrm{T}}$$ where $U$ and $V$ are othogonal matrices, and where $U$ is a matrix of the left singular vectors of $X$, $D$ is a diagonal matrix containing the singular values of $X$ and $V$ is the is a matrix of the right singular vectors of $X$. The matrix of principal component scores is obtained by a matrix multiplication of $U$ and $D$, and the matrix of principal component loadings is equivalent to the matrix $V$. When method = "pca.nipals", the algorithm used for principal component analysis is the non-linear iterative partial least squares (nipals). In the case of the of the partial least squares projection (a.k.a projection to latent structures) the nipals regression algorithm. Details on the "nipals" algorithm are presented in Martens (1991). When method = "opc", the selection of the components is carried out by using an iterative method based on the side information concept (Ramirez-Lopez et al. 2013a, 2013b). First let be $P$ a sequence of retained components (so that $P = 1, 2, ...,k$. At each iteration, the function computes a dissimilarity matrix retaining $p_i$ components. The values of the side information of the samples are compared against the side information values of their most spectrally similar samples. The optimal number of components retrieved by the function is the one that minimizes the root mean squared differences (RMSD) in the case of continuous variables, or maximizes the kappa index in the case of categorical variables. In this process the simEval function is used. Note that for the "opc" method is necessary to specify Yr (the side information of the samples). Multi-threading for the computation of dissimilarities (see cores parameter) is based on OpenMP and hence works only on windows and linux.

References

Martens, H. (1991). Multivariate calibration. John Wiley & Sons. Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013a. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex datasets. Geoderma 195-196, 268-279. Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J. A. M., Scholten, T. 2013b. Distance and similarity-search metrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.

See Also

orthoDiss, simEval, mbl

Examples

Run this code
require(prospectr)

data(NIRsoil)

Xu <- NIRsoil$spc[!as.logical(NIRsoil$train),]
Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]
Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train),]

Xu <- Xu[!is.na(Yu),]
Yu <- Yu[!is.na(Yu)]

Xr <- Xr[!is.na(Yr),]
Yr <- Yr[!is.na(Yr)] 

# A partial least squares projection using the "opc" method
# for the selection of the optimal number of components
plsProj <- orthoProjection(Xr = Xr, Yr = Yr, X2 = Xu, 
                           method = "pls", 
                           pcSelection = list("opc", 40))
                           
# A principal components projection using the "opc" method
# for the selection of the optimal number of components
pcProj <- orthoProjection(Xr = Xr, Yr = Yr, X2 = Xu, 
                          method = "pca", 
                          pcSelection = list("opc", 40))
                           
# A partial least squares projection using the "cumvar" method
# for the selection of the optimal number of components
plsProj2 <- orthoProjection(Xr = Xr, Yr = Yr, X2 = Xu, 
                            method = "pls", 
                            pcSelection = list("cumvar", 0.99))

Run the code above in your browser using DataLab