Learn R Programming

SelectBoost.beta (version 0.4.5)

compare_selectors_bootstrap: Bootstrap selection frequencies across selectors

Description

Bootstraps the dataset B times and records how often each variable is selected by each selector. Observations containing NA in either X or Y are removed prior to resampling. Column names are abbreviated internally and mapped back to the originals in the output just like in compare_selectors_single().

Usage

compare_selectors_bootstrap(X, Y, B = 50, include_enet = TRUE, seed = NULL)

Value

Long data frame with columns selector, variable, freq in [0,1], n_success, and n_fail. The freq column reports the share of bootstrap replicates where a variable was selected by the corresponding selector. Values near 1 signal high stability whereas small values indicate weak evidence. n_success counts the successful fits contributing to the frequency estimate (excluding failed replicates), while n_fail records the number of unsuccessful fits. A "failures" attribute attached to the returned data frame lists the replicate indices and messages for any encountered errors.

Arguments

X

Numeric matrix (n × p) of mean-submodel predictors.

Y

Numeric response in (0,1). Values are squeezed to (0,1) internally.

B

Number of bootstrap replications.

include_enet

Logical; include ENet if gamlss.lasso is installed.

seed

Optional RNG seed.

Examples

Run this code
set.seed(1)
X <- matrix(rnorm(300), 100, 3); Y <- plogis(X[, 1])
Y <- rbeta(100, Y * 30, (1 - Y) * 30)
freq <- compare_selectors_bootstrap(X, Y, B = 10, include_enet = FALSE)
head(freq)
subset(freq, freq > 0.8)

# \donttest{
# Increase B until the reported frequencies stabilise. For example,
freq_big <- compare_selectors_bootstrap(X, Y, B = 200, include_enet = FALSE)
stats::aggregate(freq ~ selector, freq_big, summary)
# }

Run the code above in your browser using DataLab