Learn R Programming

MLSP (version 0.1.0)

soil_preprocess: Soil and Spectral Data Preprocessing for Model Training

Description

These functions fit predictive models for soil properties using VNIR spectral data. Each function applies a specific machine learning method:

  • pcr_preprocess() – Principal Component Regression (PCR)

  • plsr_preprocess() – Partial Least Squares Regression (PLSR)

  • lasso_preprocess() – LASSO regression

  • rf_preprocess() – Random Forest regression

  • cubist_preprocess() – Cubist regression

Computes mean performance metrics across multiple calibration and validation sets. Typically used to summarize the results of soil property prediction models generated by preprocessing functions such as pcr_preprocess(), plsr_preprocess(), lasso_preprocess(), rf_preprocess(), or cubist_preprocess().

Usage

pcr_preprocess(soil, vnir.matrix, j, preprocess, type_of_soil)

plsr_preprocess(soil, vnir.matrix, j, preprocess, type_of_soil)

lasso_preprocess(soil, vnir.matrix, j, preprocess, type_of_soil)

rf_preprocess(soil, vnir.matrix, j, preprocess, type_of_soil)

cubist_preprocess(soil, vnir.matrix, j, preprocess, type_of_soil)

results(metric.list, soil_type)

Value

A list of MSD metric objects for calibration and validation sets, specific to the fitted model.

A named numeric vector of mean performance metrics across all splits:

LV

Latent variable / model index

cv-r2

Cross-validated R-squared for calibration set

cv-bias

Bias in cross-validation for calibration set

cv-rmse

Root mean squared error in cross-validation for calibration set

cal-mse

Mean squared error for calibration set

cal-rpiq

Ratio of performance to interquartile distance for calibration set

val-r2

R-squared for validation set

val-bias

Bias for validation set

val-rmse

Root mean squared error for validation set

val-mse

Mean squared error for validation set

val-rpiq

Ratio of performance to interquartile distance for validation set

Arguments

soil

A data frame of soil properties. Must include the target soil variable.

vnir.matrix

A numeric matrix of VNIR spectral data.

j

A list of index vectors specifying calibration sample sets (e.g., from merge_of_lab_and_spectrum).

preprocess

A preprocessing function to apply to the spectral data (e.g., smoothing, normalization).

type_of_soil

An integer index selecting which soil property column to model.

metric.list

A list of MSD metric objects returned by one of the preprocessing/model functions. Each element corresponds to a model fit on a calibration/validation split.

soil_type

Optional, an integer or string indicating which soil property was modeled (currently not used internally but kept for consistency).

Details

All functions use the same workflow:

  1. Combine the selected soil property with preprocessed spectra.

  2. Split data into calibration and validation sets (using sample indices).

  3. Fit the chosen model across multiple calibration/validation partitions.

  4. Generate predictions and compute performance metrics (MSD-based).

See Also

merge_of_lab_and_spectrum, ml_f

Examples

Run this code
# \donttest{
# Example with PCR
results_pcr <- pcr_preprocess(soil, vnir.matrix, j, preprocess = scale, type_of_soil = 2)

# Example with Random Forest
results_rf <- rf_preprocess(soil, vnir.matrix, j, preprocess = scale, type_of_soil = 2)
# }

# \donttest{
msd_list <- pcr_preprocess(soil, vnir.matrix, j, preprocess = scale, type_of_soil = 2)
results_summary <- results(msd_list)
# }

Run the code above in your browser using DataLab