plot_hcp_intervals: Plot HCP prediction intervals (band vs covariate or intervals by patient)

Description

Unified plotting function for two common visualizations of HCP prediction intervals:

mode="band": plot an interval band (lo/hi) versus a 1D covariate (e.g., time X1).
mode="pid": plot one interval per patient on the x-axis (patients optionally sorted by a covariate).

Usage

plot_hcp_intervals(
  df,
  mode = c("band", "pid"),
  lo_col = "lo",
  hi_col = "hi",
  y_true_col = NULL,
  y_true = NULL,
  show_center = TRUE,
  show_true = TRUE,
  x_col = NULL,
  pid_col = "pid",
  x_sort_col = NULL,
  max_patients = NULL,
  ...
)

Value

Invisibly returns the data.frame used for plotting:

For mode = "band", the input df sorted by x_col.
For mode = "pid", the input df sorted by pid_col or x_sort_col, if provided.

Arguments

df: A data.frame containing prediction results. It must include the interval endpoints specified by lo_col and hi_col, and the covariate columns required by the chosen plotting mode.
mode: Plotting mode. Use "band" to visualize an interval band as a function of a continuous covariate, or "pid" to visualize one prediction interval per patient on the x-axis.
lo_col: Name of the column containing the lower endpoint of the prediction interval. Default is "lo".
hi_col: Name of the column containing the upper endpoint of the prediction interval. Default is "hi".
y_true_col: Optional name of a column in df containing the true outcome values. Used for overlaying truth points when show_true = TRUE.
y_true: Optional numeric vector of true outcome values with length equal to nrow(df). If provided, this overrides y_true_col.
show_center: Logical; if TRUE, draw the midpoint of each interval (as a dashed line in mode = "band" or as points in mode = "pid").
show_true: Logical; if TRUE, overlay true outcome values when available.
x_col: (mode = "band") Name of the covariate column used as the x-axis in the interval band plot (e.g., time or another continuous predictor).
pid_col: (mode = "pid") Name of the column identifying patients (or clusters). Each patient must appear exactly once in df. Default is "pid".
x_sort_col: (mode = "pid") Optional covariate column used to order patients along the x-axis (e.g., "X1"). If NULL, patients are ordered by their IDs.
max_patients: (mode = "pid") Optional maximum number of patients to display. If specified, only the first max_patients patients after sorting are plotted.
...: Additional graphical parameters passed to plot, such as main, xlab, ylab, xlim, or ylim.

Examples

Run this code

## ------------------------------------------------------------
## Two common plots:
## (A) one patient, multiple measurements  -> interval band vs X1
## (B) multiple patients, one measurement -> intervals by patient (sorted by X1)
## ------------------------------------------------------------
dat_train <- generate_clustered_mar(
  n = 200, m = 20, d = 1,
  x_dist = "uniform", x_params = list(min = 0, max = 10),
  hetero_gamma = 2.5,
  target_missing = 0.30,
  seed = 1
)
y_grid <- seq(-6, 10, length.out = 201)

## test data with latent truth
dat_test <- generate_clustered_mar(
  n = 100, m = 20, d = 1,
  x_dist = "uniform", x_params = list(min = 0, max = 10),
  hetero_gamma = 2.5,
  seed = 999
)

## ---------- Case A: P=1, M>1 (one patient, multiple measurements) ----------
pid <- dat_test$id[1]
idx <- which(dat_test$id == pid)
idx <- idx[order(dat_test$X1[idx])][1:10]
test_1M <- data.frame(pid = pid, X1 = dat_test$X1[idx], y_true = dat_test$Y_full[idx])

out_1M <- hcp_predict_targets(
  dat = dat_train, test = test_1M,
  x_cols = "X1", y_grid = y_grid,
  alpha = 0.1,
  S = 2, B = 2,
  seed = 1
)
plot_hcp_intervals(
  out_1M$pred, mode = "band", x_col = "X1",
  y_true_col = "y_true", show_true = TRUE,
  main = "Case A: one patient, multiple time points (band vs time)"
)

## ---------- Case B: P>1, M=1 (multiple patients, one measurement each) ----------
## take one measurement per patient: j==1 for the first 20 patients
pids <- unique(dat_test$id)[1:20]
test_P1 <- subset(dat_test, id %in% pids & j == 1,
                  select = c(id, X1, Y_full))
names(test_P1) <- c("pid", "X1", "y_true")

out_P1 <- hcp_predict_targets(
  dat = dat_train, test = test_P1,
  x_cols = "X1", y_grid = y_grid,
  alpha = 0.1,
  S = 2, B = 2,
  seed = 1
)
plot_hcp_intervals(
  out_P1$pred, mode = "pid", pid_col = "pid", x_sort_col = "X1",
  y_true_col = "y_true", show_true = TRUE,
  main = "Case B: multiple patients, one time point (by patient)"
)

Run the code above in your browser using DataLab