- data
A data frame containing the cytokine data, with one column as
the grouping variable and the rest as numerical features.
- group_col
A string representing the name of the column with the
grouping variable (i.e., the target variable for classification).
- train_fraction
A numeric value between 0 and 1 representing the
proportion of data to use for training (default is 0.7).
- nrounds
An integer specifying the number of boosting rounds
(default is 500).
- max_depth
An integer specifying the maximum depth of the trees
(default is 6).
- eta
Deprecated; use learning_rate instead.
- learning_rate
A numeric value representing the learning rate
(default is 0.1). This replaces the deprecated eta argument.
- nfold
An integer specifying the number of folds for cross-validation
(default is 5).
- cv
A logical value indicating whether to perform cross-validation
(default is FALSE).
- objective
A string specifying the XGBoost objective function
(default is "multi:softprob" for multi-class classification).
- early_stopping_rounds
An integer specifying the number of rounds
with no improvement to stop training early (default is NULL).
- eval_metric
A string specifying the evaluation metric
(default is "mlogloss").
- gamma
Deprecated; use min_split_loss instead.
- min_split_loss
A numeric value for the minimum loss reduction
required to make a further partition (default is 0). This replaces
the deprecated gamma argument.
- colsample_bytree
A numeric value specifying the subsample ratio
of columns when constructing each tree (default is 1).
- subsample
A numeric value specifying the subsample ratio of the
training instances (default is 1).
- min_child_weight
A numeric value specifying the minimum sum of
instance weight needed in a child (default is 1).
- top_n_features
An integer specifying the number of top features to
display in the importance plot (default is 10).
- verbose
An integer specifying the verbosity of the training
process (default is 1).
- plot_roc
A logical value indicating whether to plot the ROC curve
and calculate the AUC for binary classification (default is FALSE).
- print_results
A logical value indicating whether to print the results
of the model training and evaluation (default is FALSE). If set to TRUE,
it will print the confusion matrix, and feature importance.
- seed
An integer specifying the seed for reproducibility (default is 123).