RejStep: CMB validation step

Description

Validation step to combine different SingBoost models.

Usage

RejStep(
  D,
  nsing,
  Bsing = 1,
  ind,
  sing = FALSE,
  singfam = Gaussian(),
  evalfam = Gaussian(),
  M = 10,
  m_iter = 100,
  kap = 0.1,
  LS = FALSE,
  best = 1
)

Arguments

Data matrix. Has to be an \(n \times (p+1)-\)dimensional data frame in the format \((X,Y)\). The \(X-\)part must not contain an intercept column containing only ones since this column will be added automatically.

nsing

Number of observations (rows) used for the SingBoost submodels.

Bsing

Number of subsamples based on which the SingBoost models are validated. Default is 1. Not to confuse with parameter B for the Stability Selection.

ind

Vector with indices for dividing the data set into training and validation data.

sing

If sing=FALSE and the singfam family is a standard Boosting family that is contained in the package mboost, the CMB aggregation procedure is executed for the corresponding standard Boosting models.

singfam

A SingBoost family. The SingBoost models are trained based on the corresponding loss function. Default is Gaussian() (squared loss).

evalfam

A SingBoost family. The SingBoost models are validated according to the corresponding loss function. Default is Gaussian() (squared loss).

An integer between 2 and m_iter. Indicates that in every \(M-\)th iteration, a singular iteration will be performed. Default is 10.

m_iter

Number of SingBoost iterations. Default is 100.

kap

Learning rate (step size). Must be a real number in \(]0,1]\). Default is 0.1 It is recommended to use a value smaller than 0.5.

If a singfamily object that is already provided by mboost is used, the respective Boosting algorithm will be performed in the singular iterations if Ls is set to TRUE. Default is FALSE.

best

Needed in the case of localized ranking. The parameter K of the localized ranking loss will be computed by \(best \cdot n\) (rounded to the next larger integer). Warning: If a parameter K is inserted into the LocRank family, it will be ignored when executing SingBoost.

Value

loss

Vector of validation losses.

occ

Selection frequencies for each Boosting model.

Details

Divides the data set into a training and a validation set. The SingBoost models are computed on the training set and evaluated on the validation set based on the loss function corresponding to the selected Boosting family.

References

Werner, T., Gradient-Free Gradient Boosting, PhD Thesis, Carl von Ossietzky University Oldenburg, 2020