Learn R Programming

dtGAP

Supervised Generalized Association Plots Based on Decision Trees

Decision trees are prized for their simplicity and interpretability but often fail to reveal underlying data structures. Generalized Association Plots (GAP) excel at illustrating complex associations yet are typically unsupervised. dtGAP bridges this gap by embedding supervised correlation and distance measures into GAP for enriched decision-tree visualization, offering confusion matrix maps, decision-tree matrix maps, predicted class membership maps, and evaluation panels.

View the full vignette

Installation

# Install from CRAN
install.packages("dtGAP")

# Or install the development version from GitHub
# install.packages("devtools")
devtools::install_github("hanmingwu1103/dtGAP")

Quick Start

library(dtGAP)

penguins <- na.omit(penguins)
dtGAP(
  data_all = penguins, model = "party", show = "all",
  trans_type = "percentize", target_lab = "species",
  simple_metrics = TRUE,
  label_map_colors = c(
    "Adelie" = "#50046d", "Gentoo" = "#fcc47f",
    "Chinstrap" = "#e15b76"
  ),
  show_col_prox = FALSE, show_row_prox = FALSE,
  raw_value_col = colorRampPalette(
    c("#33286b", "#26828e", "#75d054", "#fae51f")
  )(9)
)

Features

Tree Models

Choose between two tree models via the model argument:

  • "rpart" (classic CART): Each node shows class-membership probabilities and the percentage of samples in each branch.
  • "party" (conditional inference trees): Each internal node is annotated with its split-variable p-value and the percentage of samples in each branch.

Data Subsets

Control which data to visualize with the show argument: "all", "train", or "test".

Row and Column Proximity

  • Column Proximity: Combined conditional correlation matrix weighted by group memberships.
  • Row Proximity: Supervised distance combining within-leaf dispersion and between-leaf separation using linkage "CT" (centroid), "SG" (single), or "CP" (complete).

Use any method from the seriation package to reorder rows and columns. The cRGAR score quantifies order quality (near 0 = good sorting, near 1 = many violations).

Data Transformation

Choose a suitable transformation via trans_type: "none", "percentize", "normalize", or "scale".

Evaluation Metrics

When print_eval = TRUE, an evaluation panel shows:

  • Data Information: Dataset name, model, train/test sizes, proximity method, linkage, seriation algorithm, and cRGAR score.
  • Train/Test Metrics:
    • Full confusion-matrix report (default, via caret::confusionMatrix())
    • Simple metrics (simple_metrics = TRUE): Accuracy, Balanced Accuracy, Kappa, Precision, Recall, Specificity

Train/Test Workflow

dtGAP(
  data_train = train_covid, data_test = test_covid,
  target_lab = "Outcome", show = "test",
  label_map = c("0" = "Survival", "1" = "Death"),
  label_map_colors = c("Survival" = "#50046d", "Death" = "#fcc47f"),
  simple_metrics = TRUE
)

Regression

dtGAP also supports regression tasks with metrics including R-squared, MAE, RMSE, and CCC:

dtGAP(
  data_all = galaxy, task = "regression",
  target_lab = "target", show = "all",
  trans_type = "percentize", model = "party",
  simple_metrics = TRUE
)

Variable Selection

Focus the heatmap on a subset of features while the tree is still trained on all variables:

dtGAP(
  data_train = train_covid, data_test = test_covid,
  target_lab = "Outcome", show = "test",
  select_vars = c("LDH", "Lymphocyte")
)

Custom Tree Input

Pass a pre-trained tree directly via the fit parameter. Supports rpart, party, and train (caret) objects with automatic model detection:

library(rpart)
custom_tree <- rpart(Outcome ~ ., data = train_covid)

dtGAP(
  fit = custom_tree,
  data_train = train_covid, data_test = test_covid,
  target_lab = "Outcome", show = "test"
)

Interactive Visualization

Set interactive = TRUE to launch a Shiny-based heatmap viewer powered by InteractiveComplexHeatmap:

dtGAP(
  data_train = train_covid, data_test = test_covid,
  target_lab = "Outcome", show = "test",
  interactive = TRUE
)

Multi-Model Comparison

Compare two or more tree models side-by-side with compare_dtGAP():

compare_dtGAP(
  models = c("rpart", "party"),
  data_train = train_covid, data_test = test_covid,
  target_lab = "Outcome", show = "test"
)

Random Forest Extension

Visualize conditional random forests via partykit::cforest:

# Ensemble summary: variable importance + representative tree
result <- rf_summary(
  data_train = train_covid, data_test = test_covid,
  target_lab = "Outcome", ntree = 50
)

# Visualize a single tree from the forest
rf_dtGAP(
  data_train = train_covid, data_test = test_covid,
  target_lab = "Outcome", show = "test",
  tree_index = result$rep_tree_index, ntree = 50
)

Export Plots

Save visualizations to PNG, PDF, or SVG:

save_dtGAP(
  file = "my_plot.png",
  data_train = train_covid, data_test = test_covid,
  target_lab = "Outcome", show = "test"
)

Customization

  • Variable importance: col_var_imp, var_imp_bar_width, var_imp_fontsize
  • Split variable labels: split_var_bg, split_var_fontsize
  • Color palettes (any RColorBrewer palette):
    • Col_Prox_palette / Col_Prox_n_colors
    • Row_Prox_palette / Row_Prox_n_colors
    • sorted_dat_palette / sorted_dat_n_colors
  • Label mapping: label_map, label_map_colors
  • Proximity display: show_row_prox, show_col_prox
  • Layout: tree_p controls the proportion of canvas allocated to the tree

Included Datasets

DatasetDescriptionObservationsTask
Psychosis_DisorderSAPS/SANS symptom ratings95Classification
penguinsPalmer penguins morphometrics344Classification
wineItalian wine chemical analysis178Classification
diabetesPima Indians diabetes768Classification
train_covid / test_covidWuhan COVID-19 patient records375 / 110Classification
wine_quality_redPortuguese red wine quality1599Regression
galaxyGalaxy velocity data323Regression

Citation

Wu, H.-M., Chang, C.-Y., & Chen, C.-H. (2025). dtGAP: Supervised matrix visualization for decision trees based on the GAP framework. R package version 0.0.2. https://CRAN.R-project.org/package=dtGAP

References

  • Chen, C. H. (2002). Generalized association plots: Information visualization via iteratively generated correlation matrices. Statistica Sinica, 12, 7-29.
  • Le, T. T., & Moore, J. H. (2021). Treeheatr: An R package for interpretable decision tree visualizations. Bioinformatics, 37(2), 282-284.
  • Wu, H. M., Tien, Y. J., & Chen, C. H. (2010). GAP: A graphical environment for matrix visualization and cluster analysis. Computational Statistics & Data Analysis, 54(3), 767-778.

License

MIT

Copy Link

Version

Install

install.packages('dtGAP')

Version

0.0.2

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Han-Ming Wu

Last Published

February 18th, 2026

Functions in dtGAP (0.0.2)

test_covid

External test dataset. Medical information of Wuhan patients collected between 2020-01-10 and 2020-02-18.
sorted_mat

Sort Feature Matrix by Tree and Correlation Structure
rf_summary

Random Forest Ensemble Summary
wine_quality_red

Red variant of the Portuguese "Vinho Verde" wine.
prepare_features

Prepare Features for Modeling
train_rf

Fit a Conditional Random Forest
row_prop_anno

Annotate Row Proximity on Heatmap
train_covid

Training dataset. Medical information of Wuhan patients collected between 2020-01-10 and 2020-02-18. Containing NAs.
prediction_annotation

Annotate Predsictions Information
train_tree

Fit a Decision Tree Model
rf_dtGAP

Visualize a Single Tree from a Conditional Random Forest
prepare_tree

Prepare Tree Plot Data for Visualization
wine

Results of a chemical analysis of wines grown in a specific area of Italy.
save_dtGAP

Save dtGAP Visualization to File
scale_norm

Performs transformation on continuous variables.
dtGAP-package

dtGAP: Supervised Generalized Association Plots Based on Decision Trees
compute_layout

Compute Layout Dimensions for Tree + Heatmap Plot
draw_all

Draw Full Visualization: Decision Tree with Heatmap and Evaluation
compute_tree

Compute Decision Tree Data for Plotting and Analysis
compare_dtGAP

Compare Multiple Decision Tree Models Side-by-Side
add_data_type

Assigns a train/test indicator to a combined dataset
dtGAP

Decision Tree Generalized Association Plots (dtGAP)
Psychosis_Disorder

Psychosis Disorder Data
generate_legend_bundle

Generate a Bundle of Legends for Heatmap Components
col_ht

Create Column Heatmap with Variable Importance
make_main_heatmap

Draw Main Heatmap with Annotations
penguins

Data of three different species of penguins.
diabetes

Diabetes patient records.
get_split_vec

Build Split Factor for Heatmap Rows
eval_tree

Evaluate Tree Model Predictions and Metrics
galaxy

Galaxy dataset for regression.