Learn R Programming

projpred (version 2.1.2)

plot.vsel: Plot summary statistics of a variable selection

Description

This is the plot() method for vsel objects (returned by varsel() or cv_varsel()).

Usage

# S3 method for vsel
plot(
  x,
  nterms_max = NULL,
  stats = "elpd",
  deltas = FALSE,
  alpha = 0.32,
  baseline = if (!inherits(x$refmodel, "datafit")) "ref" else "best",
  ...
)

Arguments

x

An object of class vsel (returned by varsel() or cv_varsel()).

nterms_max

Maximum submodel size for which the statistics are calculated. Note that nterms_max does not count the intercept, so use nterms_max = 0 for the intercept-only model. For plot.vsel(), nterms_max must be at least 1.

stats

One or more character strings determining which statistics to calculate. Available statistics are:

  • "elpd": (expected) sum of log predictive densities.

  • "mlpd": mean log predictive density, that is, "elpd" divided by the number of observations.

  • "mse": mean squared error.

  • "rmse": root mean squared error. For the corresponding standard error, bootstrapping is used.

  • "acc" (or its alias, "pctcorr"): classification accuracy (binomial() family only).

  • "auc": area under the ROC curve (binomial() family only). For the corresponding standard error, bootstrapping is used.

deltas

If TRUE, the submodel statistics are estimated as differences from the baseline model (see argument baseline) instead of estimating the actual values of the statistics.

alpha

A number determining the (nominal) coverage 1 - alpha of the normal-approximation confidence intervals. For example, alpha = 0.32 corresponds to a coverage of 68%, i.e., one-standard-error intervals (because of the normal approximation).

baseline

For summary.vsel(): Only relevant if deltas is TRUE. For plot.vsel(): Always relevant. Either "ref" or "best", indicating whether the baseline is the reference model or the best submodel found (in terms of stats[1]), respectively.

...

Arguments passed to the internal function which is used for bootstrapping (if applicable; see argument stats). Currently, relevant arguments are B (the number of bootstrap samples, defaulting to 2000) and seed (see set.seed(), defaulting to sample.int(.Machine$integer.max, 1)).

Examples

Run this code
if (requireNamespace("rstanarm", quietly = TRUE)) {
  # Data:
  dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)

  # The "stanreg" fit which will be used as the reference model (with small
  # values for `chains` and `iter`, but only for technical reasons in this
  # example; this is not recommended in general):
  fit <- rstanarm::stan_glm(
    y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
    QR = TRUE, chains = 2, iter = 500, refresh = 0, seed = 9876
  )

  # Variable selection (here without cross-validation and with small values
  # for `nterms_max`, `nclusters`, and `nclusters_pred`, but only for the
  # sake of speed in this example; this is not recommended in general):
  vs <- varsel(fit, nterms_max = 3, nclusters = 5, nclusters_pred = 10,
               seed = 5555)
  print(plot(vs))
}

Run the code above in your browser using DataLab