mixgb_cv: Use cross-validation to find the optimal `nrounds`

Description

Use cross-validation to find the optimal nrounds for an Mixgb imputer. Note that this method relies on the complete cases of a dataset to obtain the optimal nrounds.

Usage

mixgb_cv(
  data,
  nfold = 5,
  nrounds = 100,
  early_stopping_rounds = 10,
  response = NULL,
  select_features = NULL,
  xgb.params = list(),
  stringsAsFactors = FALSE,
  verbose = TRUE,
  ...
)

Value

A list of the optimal nrounds, evaluation.log and the chosen response.

Arguments

data: A data.frame or a data.table with missing values.
nfold: The number of subsamples which are randomly partitioned and of equal size. Default: 5
nrounds: The max number of iterations in XGBoost training. Default: 100
early_stopping_rounds: An integer value k. Training will stop if the validation performance has not improved for k rounds.
response: The name or the column index of a response variable. Default: NULL (Randomly select an incomplete variable).
select_features: The names or the indices of selected features. Default: NULL (Select all the other variables in the dataset).
xgb.params: A list of XGBoost parameters. For more details, please check XGBoost documentation on parameters.
stringsAsFactors: A logical value indicating whether all character vectors in the dataset should be converted to factors.
verbose: A logical value. Whether to print out cross-validation results during the process.
...: Extra arguments to be passed to XGBoost.

Examples

Run this code

params <- list(max_depth = 3, subsample = 0.7, nthread = 2)
cv.results <- mixgb_cv(data = nhanes3, xgb.params = params)
cv.results$best.nrounds

imputed.data <- mixgb(data = nhanes3, m = 3, xgb.params = params,
                      nrounds = cv.results$best.nrounds)

Run the code above in your browser using DataLab

State of Data and AI Literacy Report 2025

Description

Usage

Value

Arguments

Examples