Learn R Programming

normref (version 0.0.0.1)

fb_select: Free order model selection procedure

Description

fb_select() applies the free order model selection procedure, using forward–backward selection voncken2019modelnormref. For a given GAMLSS distribution and model selection criterion, it selects the optimal polynomial degrees for all distribution parameters.

Usage

fb_select(
  data,
  age_name,
  score_name,
  family,
  selcrit = "BIC",
  spline = FALSE,
  method = "RS(10000)",
  max_poly = c(5, 5, 2, 2),
  min_poly = c(0, 0, 0, 0),
  start_poly = c(2, 1, 0, 0),
  trace = TRUE,
  seed = 123,
  parallel = FALSE
)

Value

A selected GAMLSS model with the chosen polynomial degrees and the final criterion value.

Arguments

data

data.frame. Sample on which to fit the distribution; contains the scores and ages.

age_name

string. Name of the age variable.

score_name

string. Name of the score variable.

family

string. For example, "BB", "BCPE", "NO", etc. See gamlss.dist::gamlss.family for more information.

selcrit

string. Model selection criterion: "AIC", "BIC" (default), "GAIC(3)", or "CV" (cross-validation with 10 folds).

spline

logical. If FALSE (default), estimate polynomial(s) for \(\mu\); if TRUE, estimate a p-spline for \(\mu\).

method

string. Estimation method for gamlss::gamlss(). Either "RS()", "CG()", or "mixed()", with iteration count. Default is "RS(10000)".

max_poly

vector. Maximum polynomial degrees for each parameter.

min_poly

vector. Minimum polynomial degrees for each parameter.

start_poly

vector. Starting polynomial degrees for each parameter.

trace

logical. If TRUE, prints progress during selection.

seed

integer. Random seed for cross-validation folds.

parallel

logical. If TRUE, candidate models are evaluated in parallel using future.apply. This can reduce elapsed time for computationally heavy settings (e.g., large datasets, distributions with many parameters, or when using cross-validation as the selection criterion). For light models or small datasets, the overhead of parallelization may make it slower than sequential evaluation. Parallelization is not supported for user-defined distribution families; use built-in gamlss.dist families instead. Default is FALSE.

Details

If parallel = TRUE, candidate models are evaluated in parallel using the future and future.apply packages. If these packages are not installed, a message is printed and the function continues with sequential evaluation. Parallelization can reduce elapsed time for large datasets, complex models and cross-validation, but may be slower than sequential evaluation for smaller problems.

References

voncken2019modelnormref

See Also

shape_data(), fb_select(), normtable_create()

Examples

Run this code
# \donttest{
invisible(data("ids_data"))
mydata <- shape_data(ids_data, age_name = "age", score_name = "y14", family = "BB")
mod <- fb_select(mydata, age_name = "age", score_name = "shaped_score",
                 family = "BB", selcrit = "BIC")
# }

Run the code above in your browser using DataLab