sid(Xr, X2 = NULL,
mode = "density",
center = FALSE, scaled = TRUE,
kernel = "gaussian",
n = if(mode == "density") round(0.5 * ncol(Xr)),
bw = "nrd0",
reg = 1e-04,
...)
matrix
(or data.frame
) containing the spectral (reference) data.matrix
(or data.frame
) containing the spectral data of a second set of samples."density"
(default) for computing the divergence values on the density distributions of the spectral observations, and "feature"
for computing tX
and X2
(if specified) matrices. If mode = "feature"
centering is not carried out since this option does not accept negative values which X
and X2
(if specified) matrices. Default is TRUE.mode = "density"
a character string indicating the smoothing kernel to be used. It must be one of "gaussian"
(default), "rectangular"
, "triangular"
, "epanechnikov"
, "biweight"
mode = "density"
a numerical value indicating the number of equally spaced points at which the density is to be estimated. See the density
function of the stats
package formode = "density"
a numerical value indicating the smoothing kernel bandwidth to be used. Optionally the character string "nrd0"
can be used, it computes the bandwidth using the bw.nrd
density
function of the base package.list
with the following components:
sid
"X"
is specified (i.e. X2 = NULL
), a square symmetric matrix of SID distances between all the components in "X"
. If both "X"
and "X2"
are specified, a matrix of SID distances between the components in "X"
and the components in "X2"
) where the rows represent the objects in "X"
and the columns represent the objects in "X2"
}Xr
X
matrixX2
X2
matrixdensityDisXr
mode = "density"
, the computed density distributions of Xr
densityDisX2
mode = "density"
, the computed density distributions of X2
mode = "density"
, the function first computes the probability distribution of each spectrum which result in a matrix of density distribution estimates. The density distributions of all the samples in the datasets are compared based on the kullback-leibler divergence algorithm.
When mode = "feature"
, the kullback-leibler divergence between all the samples is computed directly on the spectral variables.
The spectral information divergence (SID) algorithm (Chang, 2000) uses the Kullback-Leibler divergence ($KL$) or relative entropy (Kullback and Leibler, 1951) to account for the vis-NIR information provided by each spectrum. The SID between two spectra ($x_{i}$ and $x_{j}$) is computed as follows:
$$SID(x_{i},x_{j}) = KL(x_{i} \left |\right | x_{j}) + KL(x_{j} \left |\right | x_{i})$$
$$SID(x_{i},x_{j}) = \sum_{l=1}^{k} p_l \ log(\frac{p_l}{q_l}) + \sum_{l=1}^{k} q_l \ log(\frac{q_l}{p_l})$$
where $k$ represents the number of variables or spectral features, $p$ and $q$ are the probability vectors of $x_{i}$ and $x_{j}$ respectively which are calculated as:
$$p = \frac{x_i}{\sum_{l=1}^{k} x_{i,l}}$$
$$q = \frac{x_j}{\sum_{l=1}^{k} x_{j,l}}$$
From the above equations it can be seen that the original SID algorithm assumes that all the components in the data matrices are nonnegative. Therefore centering cannot be applied when mode = "feature"
. If a data matrix with negative values is provided and mode = "feature"
, the sid
function automatically scales the matrix as follows:
$$X_s = \frac{X-min(X)}{max(X)-min(X)}$$
or
$$X_{s} = \frac{X-min(X, X2)}{max(X, X2)-min(X, X2)}$$
$$X2_{s} = \frac{X2-min(X, X2)}{max(X, X2)-min(X, X2)}$$
if X2
is specified. The 0 values are replaced by a regularization parameter (reg
argument) for numerical stability.
The default of the sid
function is to compute the SID based on the density distributions of the spectra (mode = "density"
). For each spectrum in X
the density distribution is computed using the density
function of the stats
package.
The 0 values of the estimated density distributions of the spectra are replaced by a regularization parameter ("reg"
argument) for numerical stability. Finally the divergence between the computed spectral histogramas is computed using the SID algorithm. Note that if mode = "density"
, the sid
function will accept negative values and matrix centering will be possible.density
require(prospectr)
data(NIRsoil)
Xu <- NIRsoil$spc[!as.logical(NIRsoil$train),]
Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]
Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train),]
Xu <- Xu[!is.na(Yu),]
Xr <- Xr[!is.na(Yr),]
# Example 1
# Compute the SID distance between all the samples in Xr
xr.sid <- sid(Xr = Xr)
xr.sid
# Example 2
# Compute the SID distance between the samples in Xr and the samples
# in Xu
xru.sid <- sid(Xr = Xr, X2 = Xu)
xru.sid
# Example 3
# Compute the SID distance between the samples in Xr and the samples
# in Xu using the histograms
xru.sid.hist <- sid(Xr = Xr, X2 = Xu, mode = "feature")
xru.sid.hist
Run the code above in your browser using DataLab