SVEMnet-package: SVEMnet: Self-Validated Ensemble Models with Relaxed Lasso and Elastic-Net Regression

Description

The SVEMnet package implements Self-Validated Ensemble Models (SVEM) using Elastic Net (including lasso and ridge) regression via glmnet. SVEM averages predictions from multiple models fitted to fractionally weighted bootstraps of the data, tuned with anti-correlated validation weights. The package supports multi-response optimization with uncertainty-aware candidate generation for iterative formulation and process development.

Arguments

Core modeling and summaries

SVEMnet: Fit an SVEMnet model using Elastic Net regression (including relaxed elastic net) on fractionally weighted bootstraps.

predict.svem_model

Predict method for SVEM models (ensemble-mean aggregation by default, optional debiasing, and percentile prediction intervals when available).

coef.svem_model

Averaged (optionally debiased) coefficients from an SVEM model.

svem_nonzero

Bootstrap nonzero percentages for each coefficient, with an optional quick plot.

plot.svem_model

Quick actual-versus-predicted plot for a fitted model (with optional group colorings).

Deterministic wide expansions (bigexp helpers)

The bigexp_* helpers build and reuse a locked polynomial/interaction expansion across multiple responses and datasets:

bigexp_terms: Build a deterministic expanded RHS (polynomials, interactions, optional partial-cubic terms) with locked factor levels and numeric ranges.
bigexp_prepare: Coerce new data to match a stored bigexp_spec, including factor levels and numeric types.
bigexp_formula: Reuse a locked expansion for another response to ensure an identical factor space across models.
with_bigexp_contrasts: Temporarily restore the contrast options used when a bigexp_spec was built.
bigexp_train: Convenience wrapper that builds a bigexp_spec and prepares training data in one call.

Random tables, optimization, and candidate generation

svem_random_table_multi: Generate one shared random predictor table (with optional mixture constraints) from cached factor-space information and obtain predictions from multiple SVEM models at those points. Supports both Gaussian and binomial models; binomial predictions are returned on the probability scale. This is the lower-level sampler used by svem_score_random.

svem_score_random

Random-search scoring for multiple responses with Derringer–Suich desirabilities, user weights, optional whole-model-test (WMT) reweighting, percentile CI-based uncertainty, and (optionally) scoring of existing experimental data. Returns a scored random-search table and, when data is supplied, an augmented copy of the original data with <resp>_pred, desirabilities, scores, and an uncertainty_measure.

svem_select_from_score_table

Given a scored table (typically svem_score_random()$score_table), select one "best" row under a chosen objective and a small, diverse set of medoid candidates via PAM clustering on predictors.

svem_export_candidates_csv

Concatenate one or more selection objects from svem_select_from_score_table and export candidate tables (with metadata, predictions, and optional design-only trimming) to CSV or return them in-memory for inspection.

Whole-model testing and plotting

svem_significance_test_parallel: Parallel whole-model significance test (using foreach + doParallel) with support for mixture-constrained sampling and reuse of a locked bigexp_spec. Designed for continuous (Gaussian) responses.

svem_wmt_multi

Helper to run svem_significance_test_parallel across multiple responses and construct whole-model p-values and reweighting multipliers for use in svem_score_random.

plot.svem_significance_test

Plot helper for visualizing multiple significance-test outputs (observed vs permutation distances, fitted null, and p-values).

Auxiliary utilities and data

glmnet_with_cv: Convenience wrapper around repeated cv.glmnet() selection for robust lambda (and optional alpha) choice.

lipid_screen

Example dataset for multi-response modeling, whole-model testing, and mixture-constrained optimization demonstrations.

Families

SVEMnet currently supports:

Gaussian responses (family = "gaussian") with identity link and optional debiasing / percentile prediction intervals.
Binomial responses (family = "binomial") with logit link. The response must be 0/1 numeric or a two-level factor (first level treated as 0). Use predict(..., type = "response") for event probabilities or type = "class" for 0/1 labels (threshold = 0.5 by default).

Some higher-level utilities place additional constraints:

svem_significance_test_parallel is designed and interpreted for continuous (Gaussian) responses.
svem_score_random supports mixed Gaussian + binomial response sets, treating binomial predictions and CIs on the probability scale, but WMT-based goal reweighting (via svem_wmt_multi and the wmt argument) is only allowed when all responses are Gaussian.

Acknowledgments

OpenAI's GPT models (o1-preview through GPT-5 Pro) were used to assist with coding and roxygen documentation; all content was reviewed and finalized by the author.

Author

Maintainer: Andrew T. Karl akarl@asu.edu (ORCID)

Details

A typical workflow is:

Build a wide, deterministic factor expansion (optionally via bigexp_terms) and reuse it across responses with bigexp_formula.
Fit one or more SVEM models with SVEMnet.
Optionally run whole-model testing via svem_significance_test_parallel (and svem_wmt_multi) to assess factor relationships or reweight response goals.
Call svem_score_random to draw random points in the factor space, compute multi-response Derringer–Suich scores, optional WMT-reweighted scores, and an uncertainty measure; then use svem_select_from_score_table to pick a single "best" row and diverse medoid candidates, and svem_export_candidates_csv to export candidate tables for the next experimental round.
Run new experiments at the suggested candidates, append the data, refit the models, and repeat as needed (closed-loop optimization).

References

Gotwalt, C., & Ramsey, P. (2018). Model Validation Strategies for Designed Experiments Using Bootstrapping Techniques With Applications to Biopharmaceuticals. JMP Discovery Conference. https://community.jmp.com/t5/Abstracts/Model-Validation-Strategies-for-Designed-Experiments-Using/ev-p/849873/redirect_from_archived_page/true

Karl, A. T. (2024). A randomized permutation whole-model test heuristic for Self-Validated Ensemble Models (SVEM). Chemometrics and Intelligent Laboratory Systems, 249, 105122. tools:::Rd_expr_doi("10.1016/j.chemolab.2024.105122")

Karl, A., Wisnowski, J., & Rushing, H. (2022). JMP Pro 17 Remedies for Practical Struggles with Mixture Experiments. JMP Discovery Conference. tools:::Rd_expr_doi("10.13140/RG.2.2.34598.40003/1")

Lemkus, T., Gotwalt, C., Ramsey, P., & Weese, M. L. (2021). Self-Validated Ensemble Models for Design of Experiments. Chemometrics and Intelligent Laboratory Systems, 219, 104439. tools:::Rd_expr_doi("10.1016/j.chemolab.2021.104439")

Xu, L., Gotwalt, C., Hong, Y., King, C. B., & Meeker, W. Q. (2020). Applications of the Fractional-Random-Weight Bootstrap. The American Statistician, 74(4), 345–358. tools:::Rd_expr_doi("10.1080/00031305.2020.1731599")

Ramsey, P., Gaudard, M., & Levin, W. (2021). Accelerating Innovation with Space Filling Mixture Designs, Neural Networks and SVEM. JMP Discovery Conference. https://community.jmp.com/t5/Abstracts/Accelerating-Innovation-with-Space-Filling-Mixture-Designs/ev-p/756841

Ramsey, P., & Gotwalt, C. (2018). Model Validation Strategies for Designed Experiments Using Bootstrapping Techniques With Applications to Biopharmaceuticals. JMP Discovery Conference - Europe. https://community.jmp.com/t5/Abstracts/Model-Validation-Strategies-for-Designed-Experiments-Using/ev-p/849647/redirect_from_archived_page/true

Ramsey, P., Levin, W., Lemkus, T., & Gotwalt, C. (2021). SVEM: A Paradigm Shift in Design and Analysis of Experiments. JMP Discovery Conference - Europe. https://community.jmp.com/t5/Abstracts/SVEM-A-Paradigm-Shift-in-Design-and-Analysis-of-Experiments-2021/ev-p/756634

Ramsey, P., & McNeill, P. (2023). CMC, SVEM, Neural Networks, DOE, and Complexity: It's All About Prediction. JMP Discovery Conference.

Friedman, J. H., Hastie, T., and Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22.

Meinshausen, N. (2007). Relaxed Lasso. Computational Statistics & Data Analysis, 52(1), 374-393.

Kish, L. (1965). Survey Sampling. Wiley.

Lumley, T. (2004). Analysis of complex survey samples. Journal of Statistical Software, 9(1), 1–19.

Lumley, T. and Scott, A. (2015). AIC and BIC for modelling with complex survey data. Journal of Survey Statistics and Methodology, 3(1), 1–18.