svem_score_random: Random-search scoring for SVEM models

Description

Draw random points from the SVEM sampling schema, compute multi-response desirability scores and (optionally) whole-model-test (WMT) reweighted scores, and attach a scalar uncertainty measure based on percentile CI widths. This function does not choose candidates; see svem_select_from_score_table for selection and clustering.

Predictions used inside this scorer are always generated with debiasing disabled (i.e., debias = FALSE) regardless of whether the underlying SVEM fits support calibration.

When specs is supplied, the function also attempts to append mean-level "in spec" probabilities and related joint indicators using the SVEM bootstrap ensemble via svem_append_design_space_cols. These quantities reflect uncertainty on the process mean at each sampled setting under the fitted SVEM models, not unit-level predictive probabilities. If any error occurs in this spec-limit augmentation, it is caught; a message may be issued when verbose = TRUE, and the affected table(s) are returned without the spec-related columns.

Usage

svem_score_random(
  objects,
  goals,
  data = NULL,
  n = 50000,
  mixture_groups = NULL,
  level = 0.95,
  combine = c("geom", "mean"),
  numeric_sampler = c("random", "uniform"),
  wmt = NULL,
  verbose = TRUE,
  specs = NULL
)

Value

A list with components:

score_table: Data frame with predictors, predicted responses (columns <resp>_pred for each resp in names(objects)), per-response desirabilities, score, optional wmt_score, and uncertainty_measure. For each response r in names(objects), additional columns r_lwr, r_upr (percentile CI bounds at level level) and r_ciw_w (weighted, normalized CI width contribution to uncertainty_measure) are appended. When specs is supplied and the spec-limit augmentation succeeds, additional columns <resp>_p_in_spec_mean, <resp>_in_spec_point, p_joint_mean, and joint_in_spec_point are appended.
original_data_scored: If data is supplied, that data augmented with prediction columns <resp>_pred, per-response desirabilities, score, optional wmt_score, and uncertainty_measure; otherwise NULL. When specs is supplied and the spec-limit augmentation succeeds, the same mean-level spec columns as in score_table are appended to original_data_scored as well.
weights_original: User-normalized response weights.
weights_final: Final weights after WMT, if wmt is supplied; otherwise equal to weights_original.
wmt_p_values: Named vector of per-response whole-model p-values when wmt is supplied and contains p_values; otherwise NULL.
wmt_multipliers: Named vector of per-response WMT multipliers when wmt is supplied; otherwise NULL.

Arguments

objects

List of svem_model objects (from SVEMnet). When unnamed, svem_score_random() attempts to infer response names from the left-hand sides of the model formulas. Names (when present) are treated as response identifiers and should typically match the model response names. All models must share a common sampling schema (predictor set, factor levels, numeric ranges) compatible with svem_random_table_multi.

goals

List of per-response goal specifications. Either:

a named list, where names may be either names(objects) or the left-hand-side response names from the fitted models; or
an unnamed list with the same length as objects, in which case entries are matched to models by position.

Each goals[[response]] must be a list with at least:

goal: one of "max", "min", "target";
weight: nonnegative numeric weight.

For goal = "target", also provide target. Optional Derringer–Suich controls:

For "max" or "min": lower_acceptable, upper_acceptable, shape.
For "target": tol (symmetric), or tol_left / tol_right, and shape_left / shape_right.

When anchors/tolerances are not supplied, robust defaults are inferred from the sampled table using the q0.02–q0.98 span.

data

Optional data frame. When supplied (regardless of whether wmt is used), it is scored and returned as original_data_scored, with predictions (in <resp>_pred columns), per-response desirabilities, score (and wmt_score if applicable), and uncertainty_measure appended. When specs is supplied and the spec-limit augmentation succeeds, the same mean-level spec columns as in score_table (per-response <resp>_p_in_spec_mean, <resp>_in_spec_point, and joint p_joint_mean, joint_in_spec_point) are appended as well.

n

Number of random samples to draw in the predictor space. This is the number of rows in the sampled table used for scoring.

mixture_groups

Optional mixture and simplex constraints passed to svem_random_table_multi. Each group typically specifies mixture variable names, bounds, and a total.

level

Confidence level for percentile intervals used in the CI width and uncertainty calculations. Default 0.95.

combine

How to combine per-response desirabilities into a scalar score. One of:

"geom": weighted geometric mean (default);
"mean": weighted arithmetic mean.

numeric_sampler

Character string controlling how numeric predictors are sampled inside svem_random_table_multi. One of:

"random": Latin hypercube sampling when lhs is available, otherwise independent uniforms;
"uniform": independent uniforms over stored numeric ranges.

wmt

Optional object returned by svem_wmt_multi. When non-NULL, its multipliers (and p_values, if present) are aligned to names(objects) and used to define WMT weights, wmt_score. When NULL, only user weights are used and no WMT reweighting is applied.

verbose

Logical; if TRUE, print a compact summary of the run (and any WMT diagnostics from upstream) to the console.

specs

Optional named list of specification objects, one per response in objects for which you want to define a mean-level spec constraint. Each entry should be either NULL (no specs for that response) or a list with components:

lower: numeric lower limit (may be -Inf, NA, or NULL for a one-sided upper spec);
upper: numeric upper limit (may be Inf, NA, or NULL for a one-sided lower spec).

Names of specs, when provided, should be a subset of names(objects) or of the model response names (left-hand sides). The specification structure matches that used by svem_append_design_space_cols.

Details

Typical workflow

A common pattern is:

Fit one or more SVEMnet() models for the responses of interest.
Call svem_score_random() to:
- draw candidate settings in factor space,
- compute Derringer–Suich (DS) desirabilities and a combined multi-response score, and
- attach a scalar uncertainty measure derived from percentile CI widths.
Optionally provide specs to append mean-level "in spec" probabilities and joint indicators based on the SVEM bootstrap ensemble (process-mean assurance).
Use svem_select_from_score_table to:
- select one "best" row (e.g., maximizing score or wmt_score), and
- pick a small, diverse set of medoid candidates for optimality or exploration (e.g. high uncertainty_measure).
Run selected candidates, append the new data, refit the SVEM models, and repeat as needed.

Multi-response desirability scoring

Each response is mapped to a Derringer–Suich desirability $d_r \in [0,1]$ according to its goal:

goal = "max": larger values are better;
goal = "min": smaller values are better;
goal = "target": values near a target are best.

Per-response anchors (acceptable lower/upper limits or target-band tolerances) can be supplied in goals; when not provided, robust defaults are inferred from the sampled responses using the q0.02–q0.98 span.

Per-response desirabilities are combined into a single scalar score using either:

a weighted arithmetic mean (combine = "mean"), or
a weighted geometric mean (combine = "geom"), with a small floor applied inside the log to avoid log(0).

User-provided weights in goals[[resp]]$weight are normalized to sum to one and always define weights_original and the user-weighted score.

Whole-model reweighting (WMT)

When a WMT object from svem_wmt_multi is supplied via the wmt argument, each response receives a multiplier derived from its whole-model p-value. Final WMT weights are proportional to the product of the user weight and the multiplier, then renormalized to sum to one: $$w_r^{(\mathrm{final})} \propto w_r^{(\mathrm{user})} \times m_r,$$ where $m_r$ comes from wmt$multipliers. The user weights always define score; the WMT-adjusted weights define wmt_score. The uncertainty measure is always weighted using the user weights, even when WMT is supplied.

Binomial responses. If any responses are fitted with family = "binomial", supplying a non-NULL wmt object is not allowed and the function stops with a clear error. Predictions and CI bounds for binomial responses are interpreted on the probability (response) scale and clamped to [0, 1] before desirability and uncertainty calculations.

Uncertainty measure

The uncertainty_measure is a weighted sum of robustly normalized percentile CI widths across responses. For each response, we compute the bootstrap percentile CI width $\mathrm{CIwidth}_r(x) = u_r(x) - \ell_r(x)$ and then map it to the unit interval using an affine rescaling based on the empirical q0.02 and q0.98 quantiles of the CI widths for that response (computed from the table being scored): $$ \tilde W_r(x) = \frac{ \min\{\max(\mathrm{CIwidth}_r(x), q_{0.02}(r)), q_{0.98}(r)\} - q_{0.02}(r) }{ q_{0.98}(r) - q_{0.02}(r) }. $$ The scalar uncertainty_measure is then $$ \text{uncertainty}(x) = \sum_r w_r \, \tilde W_r(x), $$ where $w_r$ are the user-normalized response weights derived from goals[[resp]]$weight. Larger values of uncertainty_measure indicate settings where the ensemble CI is relatively wide compared to the response's typical scale and are natural targets for exploration.

Spec-limit mean-level probabilities

If specs is provided, svem_score_random() attempts to pass the scored table and models to svem_append_design_space_cols to compute, for each response with an active spec:

<resp>_p_in_spec_mean: estimated probability (under the SVEM bootstrap ensemble) that the process mean at a setting lies within the specified interval;
<resp>_in_spec_point: 0/1 indicator that the point prediction lies within the same interval.

and joint quantities:

p_joint_mean: product of per-response mean-level probabilities over responses with active specs;
joint_in_spec_point: 0/1 indicator that all point predictions are in spec across responses with active specs.

Names in specs may refer either to names(objects) or to the model response names; they are automatically aligned to the fitted models.

These probabilities are defined on the conditional means at each sampled setting, not on individual units or lots, and are best interpreted as ensemble-based assurance measures under the SVEM + FRW pipeline. If the augmentation step fails for any reason (for example, missing predictor columns or incompatible models), the error is caught; a message may be issued when verbose = TRUE, and score_table and/or original_data_scored are returned without the spec-related columns.

Examples

Run this code

# \donttest{
## ------------------------------------------------------------------------
## Multi-response SVEM scoring with Derringer–Suich desirabilities
## ------------------------------------------------------------------------

data(lipid_screen)

# Build a deterministic expansion once and reuse for all responses
spec <- bigexp_terms(
  Potency ~ PEG + Helper + Ionizable + Cholesterol +
    Ionizable_Lipid_Type + N_P_ratio + flow_rate,
  data             = lipid_screen,
  factorial_order  = 3,
  polynomial_order = 3,
  include_pc_2way  = TRUE,
  include_pc_3way  = FALSE
)

form_pot <- bigexp_formula(spec, "Potency")
form_siz <- bigexp_formula(spec, "Size")
form_pdi <- bigexp_formula(spec, "PDI")

set.seed(1)
fit_pot <- SVEMnet(form_pot, lipid_screen)
fit_siz <- SVEMnet(form_siz, lipid_screen)
fit_pdi <- SVEMnet(form_pdi, lipid_screen)

# Collect SVEM models in a named list by response
objs <- list(Potency = fit_pot, Size = fit_siz, PDI = fit_pdi)

# Targets and user weights for Derringer–Suich desirabilities
goals <- list(
  Potency = list(goal = "max", weight = 0.6),
  Size    = list(goal = "min", weight = 0.3),
  PDI     = list(goal = "min", weight = 0.1)
)

# Optional mixture constraints (composition columns sum to 1)
mix <- list(list(
  vars  = c("PEG", "Helper", "Ionizable", "Cholesterol"),
  lower = c(0.01, 0.10, 0.10, 0.10),
  upper = c(0.05, 0.60, 0.60, 0.60),
  total = 1.0
))

# Basic random-search scoring without WMT or design-space specs
set.seed(3)
scored_basic <- svem_score_random(
  objects         = objs,
  goals           = goals,
  n               = 10000,          # number of random candidates
  mixture_groups  = mix,
  combine         = "geom",
  numeric_sampler = "random",
  verbose         = FALSE
)

# Scored candidate table: predictors, _pred, _des, score, uncertainty
names(scored_basic$score_table)
head(scored_basic$score_table)

# Scored original data (if 'data' is supplied)
# scored_basic$original_data_scored contains predictions + desirabilities

## ------------------------------------------------------------------------
## With whole-model tests (WMT) and process-mean specifications
## ------------------------------------------------------------------------

set.seed(123)
wmt_out <- svem_wmt_multi(
  formulas       = list(Potency = form_pot,
                        Size    = form_siz,
                        PDI     = form_pdi),
  data           = lipid_screen,
  mixture_groups = mix,
  wmt_control    = list(seed = 123),
  plot           = FALSE,
  verbose        = FALSE
)

# Simple process-mean specs for a joint design space:
#   Potency >= 78, Size <= 100, PDI <= 0.25
specs_ds <- list(
  Potency = list(lower = 78),
  Size    = list(upper = 100),
  PDI     = list(upper = 0.25)
)

set.seed(4)
scored_full <- svem_score_random(
  objects         = objs,
  goals           = goals,
  data            = lipid_screen,  # score the original runs as well
  n               = 25000,
  mixture_groups  = mix,
  level           = 0.95,
  combine         = "geom",
  numeric_sampler = "random",
  wmt             = wmt_out,       # optional: WMT reweighting
  specs           = specs_ds,      # optional: design-space columns
  verbose         = TRUE
)

# The scored table now includes:
#  * score, wmt_score, uncertainty_measure
#  * per-response CIs: _lwr, _upr
#  * design-space columns, e.g. Potency_p_in_spec_mean, p_joint_mean
names(scored_full$score_table)

## ------------------------------------------------------------------------
## Positional (unnamed) goals matched to objects by position
## ------------------------------------------------------------------------

data(lipid_screen)

# Build a deterministic expansion once and reuse for all responses
spec <- bigexp_terms(
  Potency ~ PEG + Helper + Ionizable + Cholesterol +
    Ionizable_Lipid_Type + N_P_ratio + flow_rate,
  data             = lipid_screen,
  factorial_order  = 3,
  polynomial_order = 3,
  include_pc_2way  = TRUE,
  include_pc_3way  = FALSE
)

form_pot <- bigexp_formula(spec, "Potency")
form_siz <- bigexp_formula(spec, "Size")
form_pdi <- bigexp_formula(spec, "PDI")

set.seed(1)
fit_pot <- SVEMnet(form_pot, lipid_screen)
fit_siz <- SVEMnet(form_siz, lipid_screen)
fit_pdi <- SVEMnet(form_pdi, lipid_screen)

# Collect SVEM models in a list.
# Here goals will be matched by position: Potency, Size, PDI.
objs <- list(fit_pot, fit_siz, fit_pdi)

# Positional goals (unnamed list): must have same length as 'objects'
goals_positional <- list(
  list(goal = "max", weight = 0.6),  # for Potency (objs[[1]])
  list(goal = "min", weight = 0.3),  # for Size    (objs[[2]])
  list(goal = "min", weight = 0.1)   # for PDI     (objs[[3]])
)

set.seed(5)
scored_pos <- svem_score_random(
  objects         = objs,
  goals           = goals_positional,
  n               = 5000,
  numeric_sampler = "random",
  verbose         = FALSE
)

names(scored_pos$score_table)

# }

Run the code above in your browser using DataLab