Learn R Programming

⚠️There's a newer version (2.2.3) of this package.Take me there.

Regression and Similarity Evaluation for Memory-Based Learning in Spectral Chemometrics

Leo Ramirez-Lopez & Antoine Stevens

Visit the resemble site here

Installing the package is very simple:

install.packages('resemble')

If you do not have the following packages installed, in some cases it is better to install them first

install.packages('Rcpp')
install.packages('RcppArmadillo')
install.packages('foreach')
install.packages('iterators')

Note: Apart from these packages we stronly recommend to download and install Rtools (directly from here or from CRAN https://cran.r-project.org/bin/windows/Rtools/). This is important for obtaining the proper C++ toolchain that you might need for using resemble.

Then, install resemble

install.packages('C:/MyFolder/resemble-1.2.2.zip', repos = NULL)

The development version can be obtained at the package website

After installing resemble you should be also able to run the following lines:

require(resemble)

help(mbl)

#install.packages('prospectr')
require(prospectr)

data(NIRsoil)

Xu <- NIRsoil$spc[!as.logical(NIRsoil$train),]
Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]
Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train),]

Xu <- Xu[!is.na(Yu),]
Xr <- Xr[!is.na(Yr),]

Yu <- Yu[!is.na(Yu)]
Yr <- Yr[!is.na(Yr)]

# Example of the mbl function
# A mbl approach (the spectrum-based learner) as implemented in Ramirez-Lopez et al. (2013)
# An exmaple where Yu is supposed to be unknown, but the Xu (spectral variables) are known
ctrl <- mblControl(sm = 'pc', pcSelection = list('opc', 40),
                   valMethod = 'NNv', center = TRUE)

sbl.u <- mbl(Yr = Yr, Xr = Xr, Yu = NULL, Xu = Xu,
             mblCtrl = ctrl,
             dissUsage = 'predictors',
             k = seq(40, 150, by = 10),
             method = 'gpr')

getPredictions(sbl.u)

resemble implements a function dedicated to non-linear modelling of complex visible and infrared spectral data based on memory-based learning (MBL, a.k.a instance-based learning or local modelling in the chemometrics literature). The package also includes functions for: computing and evaluate spectral similarity/dissimilarity matrices; projecting the spectra onto low dimensional orthogonal variables; removing irrelevant spectra from a reference set; etc.

The functions for computing and evaluate spectral similarity/dissimilarity matrices can be summarized as follows:

fDiss: Euclidean and Mahalanobis distances as well as the cosine dissimilarity (a.k.a spectral angle mapper)
corDiss: correlation and moving window correlation dissimilarity
sid: spectral information divergence between spectra or between the probability distributions of spectra
orthoDiss: principal components and partial least squares dissimilarity (including several options)
simEval: evaluates a given similarity/dissimilarity matrix based on the concept of side information

The functions for projecting the spectra onto low dimensional orthogonal variables are:

pcProjection: projects the spectra onto a principal component space
plsProjection: projects the spectra onto a partial least squares component space (a.k.a projection to latent structures)
orthoProjection: reproduces either the pcProjection or the plsProjection functions

The projection functions also offer different options for optimizing/selecting the number of components involved in the projection.

The functions modelling the spectra using memory-based learning are:

mblControl: controls some modelling aspects of the mbl function
mbl: models the spectra by memory-based learning

Some additional miscellaneous functions are:

print.mbl: prints a summary of the results obtained by the mbl function
plot.mbl: plots a summary of the results obtained by the mbl function
print.localOrthoDiss: prints local distance matrices generated with the orthoDiss function

In order to expand a little bit more the explanation on the mbl function, let's define first the basic input datasets:

  • Reference (training) set: Dataset with n reference samples (e.g. spectral library) to be used in the calibration of spectral models. Xr represents the matrix of samples (containing the spectral predictor variables) and Yr represents a given response variable corresponding to Xr.

  • Prediction set : Data set with m samples where the response variable (Yu) is unknown. However it can be predicted by applying a spectral model (calibrated by using Xr and Yr) on the spectra of these samples (Xu).

In order to predict each value in Yu, the mbl function takes each sample in Xu and searches in Xr for its k-nearest neighbours (most spectrally similar samples). Then a (local) model is calibrated with these (reference) neighbours and it immediately predicts the correspondent value in Yu from Xu. In the function, the k-nearest neighbour search is performed by computing spectral similarity/dissimilarity matrices between samples. The mbl function offers the following regression options for calibrating the (local) models:

'gpr': Gaussian process with linear kernel
'pls': Partial least squares
'wapls1': Weighted average partial least squares 1
'wapls2': Weighted average partial least squares 2 (no longer supported)

Keywords

  • Infrared spectroscopy
  • Chemometrics
  • Local modelling
  • Spectral library
  • Lazy learning
  • Soil spectroscopy

Other R'elated stuff

Bug report and development version

You can send an e-mail to the package maintainer (ramirez.lopez.leo@gmail.com) or create an issue on github.

Copy Link

Version

Install

install.packages('resemble')

Monthly Downloads

342

Version

1.2.2

License

GPL (>= 3)

Maintainer

Leonardo RamirezLopez

Last Published

March 3rd, 2016

Functions in resemble (1.2.2)

minDissV

A function to compute indices of minimum values of a distance vector
colSds

Function for computing the standard deviation of each column in a matrix
print.localOrthoDiss

Print method for an object of class orthoDiss
cSds

Standard deviation of columns
pgpcv_cpp

Internal Cpp function for performing leave-group-out cross validations for pls regression
getPredictions

Extract predictions from an object of class mbl
fDiss

Euclidean, Mahalanobis and cosine dissimilarity measurements
sqrtSm

Square root of (square) symetric matrices
corDiss

Correlation and moving correlation dissimilarity measurements (corDiss)
predopls

Prediction function for the opls and opls2 functions
projectpls

Projection function for the opls function
neigCleaning

A function for identifying samples that do not belong to any of the neighbourhoods of a given set of samples (neigCleaning)
isubset

iterator that re-organize and subset a matrix based on dissimilarity matrix (used in mbl)
gpr.dp

Gaussian process regression with dot product covariance
e2m

A function for transforming a matrix from its Euclidean spcae to its Mahalanobis space
orthoProjection

Orthogonal projections using partial least squares and principal component analysis
gprdp

Gaussian process regression with linear kernel (gprdp)
pplscv_cpp

Internal Cpp function for performing leave-group-out cross validations for pls regression
locFitnpred

Local multivariate regression
gprCv

Cross validation for Gaussian process regression
fopls

fast orthogonal scores algorithn of partial leat squares (opls)
orthoDiss

A function for computing dissimilarity matrices from orthogonal projections (orthoDiss)
mbl

A function for memory-based learning (mbl)
waplswCpp

Internal Cpp function for computing the weights of the PLS components necessary for weighted average PLS
plsCv

Cross validation for PLS regression
which_minV

A function to compute indices of minimum values of a distance vector
fastDistVV

A fast (parallel for linux) algorithm of (squared) Euclidean cross-distance for vectors written in C++
smpl

A simple ramdom sampling function
fastDist

A fast distance algorithm for two matrices written in C++
plot.orthoProjection

Plot method for an object of class orthoProjection
print.orthoProjection

Print method for an object of class orthoProjection
movcorDist

Moving/rolling correlation distance of two matrices
plot.mbl

Plot method for an object of class mbl
simEval

A function for evaluating similarity/dissimilarity matrices (simEval)
resemble-package

Overview of the functions in the resemble package
sid

A function for computing the spectral information divergence between spectra (sid)
fastDistVVL

A fast (serial) algorithm of (squared) Euclidean cross-distance for vectors written in C++
mblControl

A function that controls some aspects of the memory-based learning process in the mbl function
pred.gpr.dp

Prediction function for "gpr.dp" (Gaussian process regression with dot product covariance)
which_min

A function to compute row-wise index of minimum values of a square distance matrix
cms

Function for computing the mean of each column in a matrix
predgprdp

Prediction function for the gprdp function (Gaussian process regression with dot product covariance)
opls

orthogonal scores algorithn of partial leat squares (opls)
wcolSds

Function for identifiying the column in a matrix with the largest standard deviation
wapls.weights

Internal function for computing the weights of the PLS components necessary for weighted average PLS
print.mbl

Print method for an object of class mbl