SaveModel: Save spectral prediction model and model performance statistics

Description

Saves spectral prediction model and model statistics to model.save.folder as model.name.Rds and model.name_stats.csv respectively

Usage

SaveModel(df, save.model = TRUE, autoselect.preprocessing = TRUE,
  preprocessing.method = NULL, model.save.folder = NULL,
  model.name = "PredictionModel", best.model.metric = "RMSE",
  tune.length = 50, model.method = "pls", num.iterations = 10,
  wavelengths = 740:1070, stratified.sampling = TRUE,
  cv.scheme = NULL, trial1 = NULL, trial2 = NULL, trial3 = NULL,
  verbose = TRUE)

Arguments

data.frame object. First column contains unique identifiers, second contains reference values, followed by spectral columns. Include no other columns to right of spectra! Column names of spectra must start with "X" and reference column must be named "reference"

save.model

If TRUE, the trained model will be saved in .Rds format to the location specified by model.save.folder. If FALSE, model will be output by function but will not save to file. Default is TRUE.

autoselect.preprocessing

Boolean that, if TRUE, will choose the preprocessing method for the saved model using the best.model.metric. If FALSE, the user must supply the preprocessing method (1-12, see DoPreprocessing() documentation for more information) of the saved model. Default is TRUE.

preprocessing.method

Number or list of numbers 1:13 corresponding to desired pretreatment method(s):

1 = raw data (default)
2 = standard normal variate (SNV)
3 = SNV and first derivative
4 = SNV and second derivative
5 = first derivative
6 = second derivative
7 = Savitzky<U+2013>Golay filter (SG)
8 = SNV and SG
9 = gap segment derivative (window size = 11)
10 = SG and first derivative (window size = 5)
11 = SG and first derivative (window size = 11)
12 = SG and second derivative (window size = 5)
13 = SG and second derivative (window size = 11)

model.save.folder

Path to folder where model will be saved. If not provided, will save to working directory.

model.name

Name that model will be saved as in model.save.folder. Default is "PredictionModel".

best.model.metric

Metric used to decide which model is best. Must be either "RMSE" or "Rsquared"

tune.length

Number delineating search space for tuning of the PLSR hyperparameter ncomp. Default is 50.

model.method

Model type to use for training. Valid options include:

"pls": Partial least squares regression (Default)
"rf": Random forest
"svmLinear": Support vector machine with linear kernel
"svmRadial": Support vector machine with radial kernel

num.iterations

Number of training iterations to perform

wavelengths

List of wavelengths represented by each column in df

stratified.sampling

If TRUE, training and test sets will be selected using stratified random sampling. This term is only used if test.data == NULL. Default is TRUE.

cv.scheme

A cross validation (CV) scheme from Jarqu<U+00ED>n et al., 2017. Options for cv.scheme include:

"CV1": untested lines in tested environments
"CV2": tested lines in tested environments
"CV0": tested lines in untested environments
"CV00": untested lines in untested environments

trial1

data.frame object that is for use only when cv.scheme is provided. Contains the trial to be tested in subsequent model training functions. The first column contains unique identifiers, second contains genotypes, third contains reference values, followed by spectral columns. Include no other columns to right of spectra! Column names of spectra must start with "X", reference column must be named "reference", and genotype column must be named "genotype".

trial2

data.frame object that is for use only when cv.scheme is provided. This data.frame contains a trial that has overlapping genotypes with trial1 but that were grown in a different site/year (different environment). Formatting must be consistent with trial1.

trial3

data.frame object that is for use only when cv.scheme is provided. This data.frame contains a trial that may or may not contain genotypes that overlap with trial1. Formatting must be consistent with trial1.

verbose

If TRUE, the number of rows removed through filtering will be printed to the console. Default is TRUE.

Value

List of model stats (in data.frame) and trained model object. Saves both to model.save.folder as well. To use optimally trained model for predictions, use tuned parameters from $bestTune

Details

Wrapper that uses DoPreprocessing, FormatCV, and TrainSpectralModel functions.

Examples

Run this code

# NOT RUN {
library(magrittr)
test.model <- ikeogu.2017 %>%
  dplyr::filter(study.name == "C16Mcal") %>%
  dplyr::rename(reference = DMC.oven) %>%
  dplyr::select(sample.id, reference, dplyr::starts_with("X")) %>%
  na.omit() %>%
  SaveModel(df = ., save.model = FALSE,
            autoselect.preprocessing = TRUE,
            model.name = "my_prediction_model",
            tune.length = 50, num.iterations = 10,
            wavelengths = 350:2500)
summary(test.model[1])
test.model[2]
# }

Run the code above in your browser using DataLab