perform_analysis: Perform analysis

Description

This uses the calculate_actual_predicted to develop and run models and calculate_performance to calculate the mean and confidence intervals of the performance (please see details below).

Usage

perform_analysis(generic_input_parameters,
specific_input_parameters_each_analysis, prepared_datasets, verbose)

Value

apparent_performance: Model is developed in the entire dataset and performance evaluated in the same sample.
bootstrap_performance: Model is developed in a subset of data (training set) and evaluated in the training dataset
test_performance: Model developed in the training set is evaluated in the entire dataset.
out_of_sample_performance: Performance in the sample that was not included in the training dataset
optimism: Test performance - bootstrap performance
average_optimism: Average of the optimism
optimism_corrected_performance: Apparent performance - average optimism
optimism_corrected_performance_with_CI: Please see details above.
out_of_sample_performance_summary: Please see details above.
apparent_performance_calibration_adjusted: For details of calibration adjustment see calculate_actual_predicted
bootstrap_performance_calibration_adjusted: As above
test_performance_calibration_adjusted: As above
out_of_sample_performance_calibration_adjusted: As abovee
optimism_calibration_adjusted: As above
average_optimism_calibration_adjusted: As above
optimism_corrected_performance_calibration_adjusted: As above
optimism_corrected_performance_with_CI_calibration_adjusted: As above
out_of_sample_performance_summary_calibration_adjusted: Summary of out-of-sample performance
apparent_performance_adjusted_mandatory_predictors_only: For details of this model, used only for research purposes, see calculate_actual_predicted, section, 'Model with with only the mandatory predictors but based on the coefficients of the entire model'.
bootstrap_performance_adjusted_mandatory_predictors_only: As above
test_performance_adjusted_mandatory_predictors_only: As above
out_of_sample_performance_adjusted_mandatory_predictors_only: As abovee
optimism_adjusted_mandatory_predictors_only: As above
average_optimism_adjusted_mandatory_predictors_only: As above
optimism_corrected_performance_adjusted_mandatory_predictors_only: As above
optimism_corrected_performance_with_CI_adjusted_mandatory_predictors_only: As above
out_of_sample_performance_summary_adjusted_mandatory_predictors_only: Summary of out-of-sample performance
actual_predicted_results_apparent: Output from calculate_actual_predicted retained for some later calculations.
average_lp_all_subjects: Output from calculate_actual_predicted retained for some later calculations.

Arguments

generic_input_parameters: This is a list that contains common information across models. If one or more items are missing or incorrect, this may result in error. Therefore, we recommend that you use the create_generic_input_parameters function to create this input.
specific_input_parameters_each_analysis: This corresponds to each analysis, i.e., a model or scoring system. If one or more items are missing or incorrect, this may result in error. Therefore, we recommend that you use the create_specific_input_parameters.
prepared_datasets: Datasets prepared using the prepare_datasets.
verbose: TRUE if the progress must be displayed and FALSE otherwise.

Author

Kurinchi Gurusamy

Details

Preparing datasets for each simulation Please see prepare_datasets.

Calculation of actual and predicted values Please see calculate_actual_predicted, particulary for details of apparent performance, bootstrap performance, test performance, optimism as described by Collins et al, 2024.

Calculation of performance measures Please see calculate_performance.

Calculation of means and confidence intervals For calculating the average performance measures and their confidence intervals across multiple simulations, appropriate transformations were performed first. After this, the bias-corrected accelerated confidence intervals were calculated based on the "bca" function from coxed package, which is not maintained anymore (R, 2025). The bias-corrected accelerated confidence intervals of the transformed data were then back transformed.

The "enhanced bootstrapping internal validation approach" method described by Collins et al., 2024 provides only the mean optimism-corrected performance. However, we have optimism from multiple simulations. Therefore, rather than calculating the average and then subtracting it from the apparent performance, the optimism from each simulation was subtracted from the apparent performance. This allowed calculation of the confidence intervals of the optimism-corrected performance using the bca function (after appropriate transformation).

The performance measures of the calibration intercept-slope adjusted models were also assessed by the same method. We have also presented the performance of the models in the 'out-of-sample subjects', i.e., the subjects who were not included in the bootstrap sample.

References

Collins GS, Dhiman P, Ma J, Schlussel MM, Archer L, Van Calster B, et al. Evaluation of clinical prediction models (part 1): from development to external validation. Bmj. 2024;384:e074819.

Examples

Run this code

  library(survival)
  colon$status <- factor(as.character(colon$status))
  # For testing, only 5 simulations are used here. Usually at least 300 to 500
  # simulations are a minimum. Increasing the simulations leads to more reliable results.
  # The default value of 2000 simulations should provide reasonably reliable results.
  generic_input_parameters <- create_generic_input_parameters(
    general_title = "Prediction of colon cancer death", simulations = 5,
    simulations_per_file = 20, seed = 1, df = colon, outcome_name = "status",
    outcome_type = "time-to-event", outcome_time = "time", outcome_count = FALSE,
    verbose = FALSE)$generic_input_parameters
  analysis_details <- cbind.data.frame(
    name = c('age', 'single_mandatory_predictor', 'complex_models',
             'complex_models_only_optional_predictors', 'predetermined_model_text'),
    analysis_title = c('Simple cut-off based on age', 'Single mandatory predictor (rx)',
                       'Multiple mandatory and optional predictors',
                       'Multiple optional predictors only', 'Predetermined model text'),
    develop_model = c(FALSE, TRUE, TRUE, TRUE, TRUE),
    predetermined_model_text = c(NA, NA, NA, NA,
    "cph(Surv(time, status) ~ rx * age, data = df_training_complete, x = TRUE, y = TRUE)"),
    mandatory_predictors = c(NA, 'rx', 'rx; differ; perfor; adhere; extent', NA, "rx; age"),
    optional_predictors = c(NA, NA, 'sex; age; nodes', 'rx; differ; perfor', NA),
    mandatory_interactions = c(NA, NA, 'rx; differ; extent', NA, NA),
    optional_interactions = c(NA, NA, 'perfor; adhere; sex; age; nodes', 'rx; differ', NA),
    model_threshold_method = c(NA, 'youden', 'youden', 'youden', 'youden'),
    scoring_system = c('age', NA, NA, NA, NA),
    predetermined_threshold = c('60', NA, NA, NA, NA),
    higher_values_event = c(TRUE, NA, NA, NA, NA)
  )
  write.csv(analysis_details, paste0(tempdir(), "/analysis_details.csv"),
            row.names = FALSE, na = "")
  analysis_details_path <- paste0(tempdir(), "/analysis_details.csv")
  # verbose is TRUE as default. If you do not want the outcome displayed, you can
  # change this to FALSE, as shown here
  results <- create_specific_input_parameters(
    generic_input_parameters = generic_input_parameters,
    analysis_details_path = analysis_details_path, verbose = FALSE)
  specific_input_parameters <- results$specific_input_parameters
  # Set a seed for reproducibility - Please see details above
  set.seed(generic_input_parameters$seed)
  prepared_datasets <- {prepare_datasets(
    df = generic_input_parameters$df,
    simulations = generic_input_parameters$simulations,
    outcome_name = generic_input_parameters$outcome_name,
    outcome_type = generic_input_parameters$outcome_type,
    outcome_time = generic_input_parameters$outcome_time,
    verbose = FALSE)}
  # There is no usually no requirement to call this function directly. This is used
  # by the perform_analysis function to create the actual and predicted values.
  specific_input_parameters_each_analysis <- specific_input_parameters[[1]]
  results <- perform_analysis(generic_input_parameters,
  specific_input_parameters_each_analysis, prepared_datasets, verbose = FALSE)
  results$apparent_performance

Run the code above in your browser using DataLab