Learn R Programming

prospectr (version 0.1.3)

shenkWest: SELECT algorithm for calibration sampling

Description

Select calibration samples from a large multivariate data using the SELECT algorithm as described in Shenk and Westerhaus (1991).

Usage

shenkWest(X,d.min=0.6,pc=0.95,rm.outlier=FALSE,.center = TRUE,.scale = FALSE)

Arguments

X
numeric data.frame or matrix
d.min
minimum distance (default = 0.6)
pc
number of principal components retained in the computation distance in the standardized Principal Component space (Mahalanobis distance). If pc < 1, the number of principal components kept corresponds to the number of components explaining at least (pc * 100) percent of the total variance (default = 0.95).
rm.outlier
logical value. if TRUE, remove observations with a standardized mahalanobis distance to the center of the data greater than 3 (default = FALSE)
.center
logical value indicating whether the input matrix should be centered before Principal Component Analysis. Default set to TRUE.
.scale
logical value indicating whether the input matrix should be scaled before Principal Component Analysis. Default set to FALSE.

Value

a list with components:
  • 'model' numeric vector giving the row indices of the input data selected for calibration
  • 'test' numeric vector giving the row indices of the remaining observations
  • 'pc'a numeric matrix of the scaled pc scores

Details

The SELECT algorithm is an iterative procedure based on the standardized Mahalanobis distance between observations. First, the observation having the highest number of neighbours within a given minimum distance is selected and its neighbours are discarded. The procedure is repeated until there is no observation left.

If the rm.outlier argument is set to TRUE, outliers will be removed before running the SELECT algorithm, using the CENTER algorithm of Shenk and Westerhaus (1991), i.e. samples with a standardized Mahalanobis distance >3 are removed.

References

Shenk, J.S., and Westerhaus, M.O., 1991. Population Definition, Sample Selection, and Calibration Procedures for Near Infrared Reflectance Spectroscopy. Crop Science 31, 469-474.

See Also

kenStone, duplex, puchwein

Examples

Run this code
data(NIRsoil)
sel <- shenkWest(NIRsoil$spc,pc=.99,d.min=.3,rm.outlier=FALSE)
plot(sel$pc[,1:2],xlab='PC1',ylab='PC2')
points(sel$pc[sel$model,1:2],pch=19,col=2)  # points selected for calibration
# without outliers
sel <- shenkWest(NIRsoil$spc,pc=.99,d.min=.3,rm.outlier=TRUE)
plot(sel$pc[,1:2],xlab='PC1',ylab='PC2')
points(sel$pc[sel$model,1:2],pch=15,col=3)  # points selected for calibration

Run the code above in your browser using DataLab