suggest_size: Suggest submodel size

Description

This function can suggest an appropriate submodel size based on a decision rule described in section "Details" below. Note that this decision is quite heuristic and should be interpreted with caution. It is recommended to examine the results via plot.vsel() and/or summary.vsel() and to make the final decision based on what is most appropriate for the problem at hand.

Usage

suggest_size(object, ...)
# S3 method for vsel
suggest_size(
  object,
  stat = "elpd",
  pct = 0,
  type = "upper",
  warnings = TRUE,
  ...
)

Arguments

object: An object of class vsel (returned by varsel() or cv_varsel()).
...: Arguments passed to summary.vsel(), except for object, stats (which is set to stat), type, and deltas (which is set to TRUE). See section "Details" below for some important arguments which may be passed here.
stat: Statistic used for the decision. See summary.vsel() for possible choices.
pct: A number giving the relative proportion (not percents) between baseline model and null model utilities one is willing to sacrifice. See section "Details" below for more information.
type: Either "upper" or "lower" determining whether the decision is based on the upper or lower confidence interval bound, respectively. See section "Details" below for more information.
warnings: Mainly for internal use. A single logical value indicating whether to throw warnings if automatic suggestion fails. Usually there is no reason to set this to FALSE.

Details

The suggested model size is the smallest model size for which either the lower or upper bound (depending on argument type) of the normal-approximation confidence interval (with nominal coverage 1 - alpha, see argument alpha of summary.vsel()) for $u_k - u_{\mbox{base}}$ (with $u_k$ denoting the $k$-th submodel's utility and $u_{\mbox{base}}$ denoting the baseline model's utility) falls above (or is equal to) $$\mbox{pct} * (u_0 - u_{\mbox{base}})$$ where $u_0$ denotes the null model utility. The baseline is either the reference model or the best submodel found (see argument baseline of summary.vsel()).

For example, alpha = 0.32, pct = 0, and type = "upper" means that we select the smallest model size for which the upper bound of the confidence interval for $u_k - u_{\mbox{base}}$ with coverage 68% exceeds (or is equal to) zero, that is, for which the submodel's utility is at most one standard error smaller than the baseline model's utility.

Examples

Run this code

if (requireNamespace("rstanarm", quietly = TRUE)) {
  # Data:
  dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)

  # The "stanreg" fit which will be used as the reference model (with small
  # values for `chains` and `iter`, but only for technical reasons in this
  # example; this is not recommended in general):
  fit <- rstanarm::stan_glm(
    y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
    QR = TRUE, chains = 2, iter = 500, refresh = 0, seed = 9876
  )

  # Variable selection (here without cross-validation and with small values
  # for `nterms_max`, `nclusters`, and `nclusters_pred`, but only for the
  # sake of speed in this example; this is not recommended in general):
  vs <- varsel(fit, nterms_max = 3, nclusters = 5, nclusters_pred = 10,
               seed = 5555)
  print(suggest_size(vs))
}

Run the code above in your browser using DataLab