This function performs cross-validation for estimating risk over a sequence
of tuning parameters (tau_seq
) by fitting a Generalized Linear Model (GLM) to the data.
It evaluates model performance by splitting the dataset into multiple folds, training
the model on a subset of the data, and testing it on the remaining portion.
cross_validation(
formula,
cat_init,
tau_seq,
discrepancy_method,
cross_validation_fold_num,
...
)
A numeric vector of averaged risk estimates, one for each value of tau
in tau_seq
.
A formula specifying the GLMs. Should at least include response variables.
A list generated from cat_glm_initialization
.
A sequence of tuning parameter values (tau
) over which
cross-validation will be performed. Each value of tau
is used to weight the
synthetic data during model fitting.
A function used to calculate the discrepancy (error) between model predictions and actual values.
The number of folds to use in cross-validation. The dataset will be randomly split into this number of subsets, and the model will be trained and tested on different combinations of these subsets.
Other arguments passed to other internal functions.
Randomization of the Data: The data is randomly shuffled into cross_validation_fold_num
subsets to ensure that the model is evaluated across different splits of the dataset.
Model Training and Prediction: For each fold, a training set is used to fit
a GLM with varying values of tau
(from tau_seq
), and the model is evaluated on a test set.
The training data consists of both the observed and synthetic data, with synthetic data weighted by tau
.
Risk Estimation: After fitting the model, the discrepancy_method
is used to calculate the
prediction error for each combination of fold and tau
. These errors are accumulated for each tau
.
Average Risk Estimate: After completing all folds, the accumulated prediction errors
are averaged over the number of folds to provide a final risk estimate for each value of tau
.