fDiss: Euclidean, Mahalanobis and cosine dissimilarity measurements

Description

This function is used to compute the dissimilarity between observations based on Euclidean or Mahalanobis distance measures or on cosine dissimilarity measures (a.k.a spectral angle mapper).

Usage

fDiss(Xr, X2 = NULL, method = "euclid", 
      center = TRUE, scaled = TRUE)

Arguments

a matrix (or data.frame) containing the (reference) data.

an optional matrix (or data.frame) containing data of a second set of observations(samples).

method

the method for computing the dissimilarity matrix. Options are "euclid" (Euclidean distance), "mahalanobis" (Mahalanobis distance) and "cosine" (cosine distance, a.k.a spectral angle mapper).

center

a logical indicating if the spectral data Xr (and X2 if specified) must be centered. If X2 is specified the data is scaled on the basis of $Xr \cup X2$.

scaled

a logical indicating if Xr (and X2 if specified) must be scaled. If X2 is specified the data is scaled on the basis of $Xr \cup X2$.

Value

a matrix of the computed dissimilarities.

Details

In the case of both the Euclidean and Mahalanobis distances, the dissimilarity matrix $D$ between between samples in a given matrix $X$ is computed as follows: $$D(x_i, x_j) = \sqrt{(x_i - x_j)M^{-1}(x_i - x_j)^{\mathrm{T}}}$$ where $M$ is the identity matrix in the case of the Euclidean distance and the variance-covariance matrix of $M$ in the case of the Mahalanobis distance. The Mahalanobis distance can also be viewed as the Euclidean distance after applying a linear transformation of the original variables. Such a linear transformation is carried by using a factorization of the inverse covariance matrix as $M^{-1} = W^{\mathrm{T}}W$, where $M$ is merely the square root of $M^{-1}$ which can be found by using a singular value decomposition. Note that when attempting to compute the Mahalanobis distance on a dataset with highly correlated variables (i.e. spectral variables) the variance-covariance matrix may result in a singular matrix which cannot be inverted and therefore the distance cannot be computed. This is also the case when the number of samples in the dataset is smaller than the number of variables. For the computation of the Mahalanobis distance, the mentioned method is used. On the other hand the cosine dissimilarity $S$ between two obsvervations $x_i$ and $x_j$ is computed as follows: $$S(x_i, x_j) = cos^{-1}{\frac{\sum_{k=1}^{p}x_{i,k} x_{j,k}}{\sqrt{\sum_{k=1}^{p} x_{i,k}^{2}} \sqrt{\sum_{k=1}^{p} x_{j,k}^{2}}}}$$ where $p$ is the number of variables of the observations. The function does not accept input data containing missing values.

Examples

Run this code

require(prospectr)

data(NIRsoil)

Xu <- NIRsoil$spc[!as.logical(NIRsoil$train),]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train),]

# Euclidean distances between all the samples in Xr
ed <- fDiss(Xr = Xr, method = "euclid", 
            center = TRUE, scaled = TRUE)

# Euclidean distances between samples in Xr and samples in Xu
ed.xr.xu <- fDiss(Xr = Xr, X2 = Xu, method = "euclid", 
                  center = TRUE, scaled = TRUE)

# Mahalanobis distance computed on the first 20 spectral variables
md.xr.xu <- fDiss(Xr = Xr[,1:20], X2 = Xu[,1:20], 
                 method = "mahalanobis", 
                 center = TRUE, scaled = TRUE)

# Cosine dissimilarity matrix
cdiss.xr.xu <- fDiss(Xr = Xr, X2 = Xu, 
                     method = "cosine", 
                     center = TRUE, scaled = TRUE)

Run the code above in your browser using DataLab