plsmod offers a function to fit ordinary, sparse, and discriminant analysis PLS models.
For regression, let’s use the Tecator data in the modeldata package:
library(tidymodels)
library(plsmod)
tidymodels_prefer()
theme_set(theme_bw())data(meats, package = "modeldata")
Note that using tidymodels_prefer() will resulting getting
parsnip::pls() instead of mixOmics::pls() when simply running
pls().
Although plsmod can fit multivariate models, we’ll concentration on a univariate model that predicts the percentage of protein in the samples.
meats <- meats %>% select(-water, -fat)
We define a sparse PLS model by setting the predictor_prop argument to
a value less than one. This allows the model fitting process to set
certain loadings to zero via regularization.
sparse_pls_spec <-
pls(num_comp = 10, predictor_prop = 1/3) %>%
set_engine("mixOmics") %>%
set_mode("regression")
The model is fit either with a formula or by passing the predictors and outcomes separately:
form_fit <-
sparse_pls_spec %>%
fit(protein ~ ., data = meats)
form_fit
## parsnip model object
##
##
## Call:
## mixOmics::spls(X = x, Y = y, ncomp = ncomp, keepX = keepX)
##
## sPLS with a 'regression' mode with 10 sPLS components.
## You entered data X of dimensions: 215 100
## You entered data Y of dimensions: 215 1
##
## Selection of [34] [34] [34] [34] [34] [34] [34] [34] [34] [34] variables on each of the sPLS components on the X data set.
## Selection of [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] variables on each of the sPLS components on the Y data set.
##
## Main numerical outputs:
## --------------------
## loading vectors: see object$loadings
## variates: see object$variates
## variable names: see object$names
##
## Functions to visualise samples:
## --------------------
## plotIndiv, plotArrow
##
## Functions to visualise variables:
## --------------------
## plotVar, plotLoadings, network, cim
# or sparse_pls_spec %>%
fit_xy(x = meats %>% select(-protein), y = meats$protein)
## parsnip model object
##
##
## Call:
## mixOmics::spls(X = x, Y = y, ncomp = ncomp, keepX = keepX)
##
## sPLS with a 'regression' mode with 10 sPLS components.
## You entered data X of dimensions: 215 100
## You entered data Y of dimensions: 215 1
##
## Selection of [34] [34] [34] [34] [34] [34] [34] [34] [34] [34] variables on each of the sPLS components on the X data set.
## Selection of [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] variables on each of the sPLS components on the Y data set.
##
## Main numerical outputs:
## --------------------
## loading vectors: see object$loadings
## variates: see object$variates
## variable names: see object$names
##
## Functions to visualise samples:
## --------------------
## plotIndiv, plotArrow
##
## Functions to visualise variables:
## --------------------
## plotVar, plotLoadings, network, cim
The pls() function can also be used with categorical outcomes.
Maintainer: Max Kuhn max@rstudio.com (ORCID)
Other contributors:
RStudio [copyright holder]
The model function works with the tidymodels infrastructure so that the model can be resampled, tuned, tided, etc.
Useful links: