fb_select: Free order model selection procedure

Description

fb_select() applies the free order model selection procedure, using forward–backward selection voncken2019modelnormref. For a given GAMLSS distribution and model selection criterion, it selects the optimal polynomial degrees for all distribution parameters.

Usage

fb_select(
  data,
  age_name,
  score_name,
  family,
  selcrit = "BIC",
  spline = FALSE,
  method = "RS(10000)",
  max_poly = c(5, 5, 2, 2),
  min_poly = c(0, 0, 0, 0),
  start_poly = c(2, 1, 0, 0),
  trace = TRUE,
  seed = 123,
  parallel = FALSE
)

Value

A selected GAMLSS model with the chosen polynomial degrees and the final criterion value.

Arguments

data: data.frame. Sample on which to fit the distribution; contains the scores and ages.
age_name: string. Name of the age variable.
score_name: string. Name of the score variable.
family: string. For example, "BB", "BCPE", "NO", etc. See gamlss.dist::gamlss.family for more information.
selcrit: string. Model selection criterion: "AIC", "BIC" (default), "GAIC(3)", or "CV" (cross-validation with 10 folds).
spline: logical. If FALSE (default), estimate polynomial(s) for \(\mu\); if TRUE, estimate a p-spline for \(\mu\).
method: string. Estimation method for gamlss::gamlss(). Either "RS()", "CG()", or "mixed()", with iteration count. Default is "RS(10000)".
max_poly: vector. Maximum polynomial degrees for each parameter.
min_poly: vector. Minimum polynomial degrees for each parameter.
start_poly: vector. Starting polynomial degrees for each parameter.
trace: logical. If TRUE, prints progress during selection.
seed: integer. Random seed for cross-validation folds.
parallel: logical. If TRUE, candidate models are evaluated in parallel using future.apply. This can reduce elapsed time for computationally heavy settings (e.g., large datasets, distributions with many parameters, or when using cross-validation as the selection criterion). For light models or small datasets, the overhead of parallelization may make it slower than sequential evaluation. Parallelization is not supported for user-defined distribution families; use built-in gamlss.dist families instead. Default is FALSE.

Details

If parallel = TRUE, candidate models are evaluated in parallel using the future and future.apply packages. If these packages are not installed, a message is printed and the function continues with sequential evaluation. Parallelization can reduce elapsed time for large datasets, complex models and cross-validation, but may be slower than sequential evaluation for smaller problems.

References

voncken2019modelnormref

Examples

Run this code

# \donttest{
invisible(data("ids_data"))
mydata <- shape_data(ids_data, age_name = "age", score_name = "y14", family = "BB")
mod <- fb_select(mydata, age_name = "age", score_name = "shaped_score",
                 family = "BB", selcrit = "BIC")
# }

Run the code above in your browser using DataLab