Learn R Programming

clinpubr (version 1.1.1)

classif_model_compare: Performance comparison of classification models

Description

Compare the performance of classification models by commonly used metrics, and generate commonly used plots including receiver operating characteristic curve plot, decision curve analysis plot, and calibration plot.

Usage

classif_model_compare(
  data,
  target_var,
  model_names,
  colors = NULL,
  save_output = FALSE,
  figure_type = "png",
  output_prefix = "model_compare",
  as_probability = FALSE,
  auto_order = TRUE
)

Value

A list of various results. If the output files are not in desired format, these results can be modified for further use.

  • metric_table: A data frame containing the performance metrics for each model.

  • roc_plot: A ggplot object of Receiver Operating Characteristic curves.

  • pr_plot: A ggplot object of Precision-Recall curves.

  • dca_plot: A ggplot object of decision curve analysis plots.

  • calibration_plot: A ggplot object of calibration plots.

Arguments

data

A data frame containing the target variable and the predicted values.

target_var

A string specifying the name of the target variable in the data frame.

model_names

A vector of strings specifying the names of the models to compare.

colors

A vector of colors to use for the plots. The last 2 colors are used for the "Treat all" and "Treat none" lines in the DCA plot.

save_output

A logical value indicating whether to output the results to files.

figure_type

A character string of the figure type. Can be "png", "pdf", and other types that ggplot2::ggsave() support.

output_prefix

A string specifying the prefix for the output files.

as_probability

A logical or a vector of variable names. The logical value indicates whether to convert variables not in range 0 to 1 into this range. The vector of variable names means to convert these variables to the range of 0 to 1.

auto_order

A logical value indicating whether to automatically order the models by their AUCs. If TRUE, the models will be ordered by their AUCs in descending order. If FALSE, the order in model_names will be retained.

Metrics

  • AUC: Area Under the Receiver Operating Characteristic Curve

  • PRAUC: Area Under the Precision-Recall Curve

  • Accuracy: Overall accuracy

  • Sensitivity: True positive rate

  • Specificity: True negative rate

  • Pos Pred Value: Positive predictive value

  • Neg Pred Value: Negative predictive value

  • F1: F1 score

  • Kappa: Cohen's kappa

  • Brier: Brier score

  • cutoff: Optimal cutoff for classification, metrics that require a cutoff are based on this value.

  • Youden: Youden's J statistic

  • HosLem: Hosmer-Lemeshow test p-value

Examples

Run this code
data(cancer, package = "survival")
df <- kidney
df$dead <- ifelse(df$time <= 100 & df$status == 0, NA, df$time <= 100)
df <- na.omit(df[, -c(1:3)])

model0 <- glm(dead ~ age + frail, family = binomial(), data = df)
model <- glm(dead ~ ., family = binomial(), data = df)
df$base_pred <- predict(model0, type = "response")
df$full_pred <- predict(model, type = "response")

classif_model_compare(df, "dead", c("base_pred", "full_pred"), save_output = FALSE)

Run the code above in your browser using DataLab