resemble (version 1.2.2)

mblControl: A function that controls some aspects of the memory-based learning process in the mbl function

Description

This function is used to specify various aspects in the memory-based learning process in the mbl function

Usage

mblControl(sm = "pc", pcSelection = list("opc", 40), pcMethod = "svd", ws = if(sm == "movcor") 41, k0, returnDiss = FALSE, center = TRUE, scaled = TRUE, valMethod = c("NNv", "loc_crossval"), localOptimization = TRUE, resampling = 10, p = 0.75, range.pred.lim = TRUE, progress = TRUE, cores = 1, allowParallel = TRUE)

Arguments

sm
a character string indicating the spectral dissimilarity metric to be used in the selection of the nearest neighbours of each observation for which a prediction is required (see mbl). Options are:
  • "euclid": Euclidean dissimilarity.
  • "cosine": Cosine dissimilarity.
  • "sidF": Spectral information divergence computed on the spectral variables.
  • "sidD": Spectral information divergence computed on the density distributions of the spectra.
  • "cor": Correlation dissimilarity.
  • "movcor": Moving window correlation dissimilarity.
  • "pc": Principal components dissimilarity: Mahalanobis dissimilarity computed on the principal components space.
  • "loc.pc": Dissimilarity estimation based on local principal components.
  • "pls": Partial least squares dissimilarity: Mahalanobis dissimilarity computed on the partial least squares space.
  • "loc.pls" Dissimilarity estimation based on local partial least squares.

The "pc" spectral dissimilarity metric is the default. If the "sidD" is chosen, the default parameters of the sid function are used however they cab be modified by specifying them as additional arguments in the mbl function.

This argument can also be set to "none", in such a case, a dissimilarity matrix must be specified in the dissimilarityM argument of the mbl function.

pcSelection
a list which specifies the method to be used for identifying the number of principal components to be retained for computing the Mahalanobis dissimilarity of each sample in sm = "Xu" to the centre of sm = "Xr". It also specifies the number of components in any of the following cases: sm = "pc", sm = "loc.pc", sm = "pls" and sm = "loc.pls". This list must contain two objects in the following order:
  • method:the method for selecting the number of components. Possible options are: "opc" (optimized pc selection based on Ramirez-Lopez et al. (2013a, 2013b). See the orthoProjection function for more details; "cumvar" (for selecting the number of principal components based on a given cumulative amount of explained variance); "var" (for selecting the number of principal components based on a given amount of explained variance); and "manual" (for specifying manually the desired number of principal components)
  • value:a numerical value that complements the selected method. If "opc" is chosen, it must be a value indicating the maximal number of principal components to be tested (see Ramirez-Lopez et al., 2013a, 2013b). If "cumvar" is chosen, it must be a value (higher than 0 and lower than 1) indicating the maximum amount of cumulative variance that the retained components should explain. If "var" is chosen, it must be a value (higher than 0 and lower than 1) indicating that components that explain (individually) a variance lower than this threshold must be excluded. If "manual" is chosen, it must be a value specifying the desired number of principal components to retain.

The default method for the pcSelection argument is "opc" and the maximal number of principal components to be tested is set to 40. Optionally, the pcSelection argument admits "opc" or "cumvar" or "var" or "manual" as a single character string. In such a case the default for "value" when either "opc" or "manual" are used is 40. When "cumvar" is used the default "value" is set to 0.99 and when "var" is used the default "value" is set to 0.01.

pcMethod
a character string indicating the principal component analysis algorithm to be used. Options are: "svd" (default) and "nipals". See orthoDiss.
ws
an odd integer value which specifies the window size when the moving window correlation dissimilarity is used (i.e sm = "movcor"). The default is 41.
k0
if any of the local dissimilarity methods is used (i.e. either sm = "loc.pc" or sm = "loc.pls") a numeric integer value. This argument controls the number of initial neighbours($k0$) to retain in order to compute the local principal components (at each neighbourhood).
returnDiss
a logical indicating if the dissimilarity matrices must be returned.
center
a logical indicating whether or not the predictor variables must be centered at each local segment (before regression).
scaled
a logical indicating whether or not the predictor variables must be scaled at each local segment (before regression).
valMethod
a character vector which indicates the (internal) validation method(s) to be used for assessing the global performance of the local models. Possible options are: "NNv" and "loc_crossval". Alternatively "none" can be used when corss-validation is not required (see details below).
localOptimization
a logical. If valMethod = "loc_crossval", it optmizes the parameters of the local pls models (i.e. pls factors for pls and minimum and maximum pls factors for wapls1).
resampling
a value indicating the number of resampling iterations at each local segment when "loc_crossval" is selected in the valMethod argument. Default is 10.
p
a value indicating the percentage of samples to be retained in each resampling iteration at each local segment when "loc_crossval" is selected in the valMethod argument. Default is 0.75 (i.e. 75 "%")
range.pred.lim
a logical value. It indicates whether the prediction limits at each local regression are determined by the range of the response variable values employed at each local regression. If FALSE, no prediction limits are imposed. Default is TRUE.
progress
a logical indicating whether or not to print a progress bar for each sample to be predicted. Default is TRUE. Note: In case multicore processing is used, this progress bar will not be printed.
cores
number of cores used for the computation of dissimilarities when method in pcSelection is "opc" (which can be computationally intensive) (default = 1). See details.
allowParallel
To allow parallel execution of the sample loop (default is TRUE)

Value

mblControl returns a list of class mbl with the specified parameters

Details

The validation methods avaliable for assessing the predictive performance of the memory-based learning method used are described as follows:
  • Leave-nearest-neighbour-out cross validation ("NNv"): From the group of neighbours of each sample to be predicted, the nearest sample (i.e. the most similar sample) is excluded and then a local model is fitted using the remaining neighbours. This model is then used to predict the value of the target response variable of the nearest sample. These predicted values are finally cross validated with the actual values (See Ramirez-Lopez et al. (2013a) for additional details). This method is faster than "loc_crossval"
  • Local leave-group-out cross validation ("loc_crossval"): The group of neighbours of each sample to be predicted is partitioned into different equal size subsets. Each partition is selected based on a stratified random sampling which takes into account the values of the response variable of the corresponding set of neighbours. The selected local subset is used as local validation subset and the remaining samples are used for fitting a model. This model is used to predict the target response variable values of the local validation subset and the local root mean square error is computed. This process is repeated $m$ times and the final local error is computed as the average of the local root mean square error of all the $m$ iterations. In the mbl function $m$ is controlled by the resampling argument and the size of the subsets is controlled by the p argument which indicates the percentage of samples to be selected from the subset of nearest neighbours. The global error of the predictions is computed as the average of the local root mean square errors.
  • No validation ("none"): No validation is carried out. If "none" is seleceted along with "NNv" and/or "loc_crossval", then it will be ignored and the respective validation(s) will be carried out.

Multi-threading for the computation of dissimilarities is based on OpenMP and hence works only on windows and linux. However, the loop used to iterate over the Xu samples in mbl uses the %dopar% operator of the foreach package, which can be used to parallelize this internal loop. The last example given in the mbl function ilustrates how to parallelize the mbl function.

References

Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013a. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex datasets. Geoderma 195-196, 268-279.

Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J. A. M., Scholten, T. 2013b. Distance and similarity-search metrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.

See Also

fDiss, corDiss, sid, orthoDiss, mbl

Examples

Run this code
#A control list with the default parameters
mblControl()

#A control list which specifies the moving correlation 
#dissimilarity metric with a moving window of 30
mblControl(sm = "movcor", ws = 31)

Run the code above in your browser using DataLab