ken.sto: Sample selection based on the Kennard-Stone algorithm

Description

The function chooses based on Euclidean distance measure most representative samples. One can (i) select a number or a percentage of a sample set or (ii) divide a sample set into calibration and representative validation set.

Usage

ken.sto(inp, per = "T", per.n = 0.3, num, va = "F", sav = "T", path = "", out = "Sel")

Arguments

inp

a numerical matrix or data.frame containing the input spectra

per

a logical value indicating whether the selected samples should be a percentage (given in per.n) or a set number (given in num) of inp. The default "T" takes a percentage.

per.n

a numerical value between 0 and 1.

num

a numerical value between 1 and the sample number minus 1.

a logical value indicating whether to select samples out of inp or to divide them into a calibration and validation set.

sav

a logical value indicating whether the function output shall be saved.

path

a character giving the path name where the function output shall be saved.

out

a character giving the function output name, in case sav is "T".

Value

ken.sto returns a list with class "ken.sto" containing the following components:
Calibration and validation setthe logical object va.
Number important PCinteger giving the number of chosen important components - important for choosing the starting samples.
PC space important PCscore value matrix of important principal components.
Chosen samples nameschosen sample names when va equal to "F"
Chosen row numberchosen row numbers when va equal to "F"
Chosen calibration sample nameschosen calibration sample names when va equal to "T"
Chosen calibration row numberchosen calibration row numbers when va equal to "T"
Chosen validation sample nameschosen validation sample names when va equal to "T"
Chosen validation row numberchosen validation row numbers when va equal to "T"

Details

Sample selection is done following and adapted procedure from Kennard & Stone (1969). It is a stepwise procedure by maximizing the Euclidean distance based on the important number of principal components to the objects already chosen. The number of important principal components is selected so that the increase in cumulative explained variance within the next three components is lower than 4 percent. The starting samples are the two extreme samples (most negative and positive ones) of the important principal components.

per.n having a value of 0.4 while va equal to "F" chooses 40 percent of the sample set. When va is equal to "T" the validation set comprises 40 percent of the sample set.

A graph is given back showing the selected samples in the principal component space (only the important PC's). This is the same graphic generated by plot.ken.sto.

References

Kennard, R. W. and Stone, L. A. (1969) Computer aided design of experiments. Technometrics 11(1), 137-148.

Examples

Run this code

ken.sto(inp, per = "T", per.n = 0.3, num, va = "F", sav = "T", path = "", out = "Sel")
plot(ken.sto)(x,...)

Run the code above in your browser using DataLab