HCPclust

HCPclust is an R package for Hierarchical Conformal Prediction (HCP), designed for clustered data where each subject has repeated measurements with within-subject dependence, and outcomes can be missing under a covariate-dependent Missing At Random (MAR) mechanism.

Installation

You can install the development version of HCPclust directly from GitHub using:

install.packages("remotes")
remotes::install_github("judywangstat/HCP")
library(HCPclust)

Method overview

The proposed method consists of four steps.

Step (1) Model fitting

Fit a pooled conditional density model $\hat{\pi}(y \mid x)$ using fit_cond_density_quantile().
Fit a marginal missingness propensity model $\hat{p}(x) = \mathbb{P}(\delta = 1 \mid x)$ using fit_missingness_propensity().

Both models are estimated on a subject-level training split to capture the distribution of $Y \mid X$ and correct for covariate-dependent missingness.

Step (2) Subsampled calibration

Construct a calibration set at the subject level.
Repeatedly draw one observation per subject to form calibration samples.

This preserves the clustered structure while reducing the impact of within-subject dependence on conformal calibration.

Step (3) Weighted conformal scoring

Compute the nonconformity score $R(x, y) = - \hat{\pi}(y \mid x)$.
Apply inverse-propensity weighting $w(x) = 1 / \hat{p}(x)$.

Step (4) Aggregation

Aggregate p-values across subsamples (B) and subject-level splits (S), using either the Cauchy combination test (CCT / ACAT) or a simple mean.

The final conformal prediction region contains all values $y$ such that $p_{\text{final}}(y) > \alpha.$

What’s included

1) Data generator

generate_clustered_mar()
- Simulates clustered (longitudinal) data under a random-effects model with covariate-dependent Missing At Random (MAR) outcomes.
- Supports:
  - subject-specific random intercepts (clustered structure),
  - optional heteroskedastic errors controlled by hetero_gamma,
  - optional within-cluster AR(1) dependence via rho,
  - optional calibration to a target marginal missing rate via target_missing.

2) Conditional density estimator

fit_cond_density_quantile()
- Fits a conditional quantile process $Q(\tau \mid x)$ using pooled observed data.
- Converts estimated quantiles into a conditional density via a quotient estimator along the quantile curve.
- Supported engines:
  - method = "rq": linear quantile regression (quantreg)
  - method = "qrf": quantile random forest (quantregForest)
- Optional monotone (isotonic) adjustment and tail decay extrapolation for numerical stability.

3) Missingness propensity model

fit_missingness_propensity()
- Estimates the missingness propensity $p(x) = \mathbb{P}(\delta = 1 \mid X)$ from pooled data.
- Supported methods:
  - logistic: logistic regression (glm)
  - grf: generalized random forests (grf::probability_forest)
  - boosting: gradient boosting (xgboost)
- Predicted propensities are clipped to [eps, 1 - eps] to stabilize inverse-propensity weighting.

4) HCP conformal prediction

hcp_conformal_region()
- Core HCP procedure for constructing a conformal prediction region for a single new target point.
- Implements the four main steps: Step (1). Model fitting, Step (2). Subsampled calibration, Step (3). Weighted conformal scoring, Step (4). Aggregation.
- Returns the prediction region as a subset of y_grid, together with the interval endpoints [lo, hi].
hcp_predict_targets()
- A flexible wrapper built on hcp_conformal_region() that automatically adapts to different test-data structures, including:
  - a single patient at a single target point,
  - a single patient at multiple target points,
  - multiple patients with one target point each,
  - multiple patients with multiple target points each.
- Supports optional per-patient Bonferroni correction, using alpha_eff = alpha / M_p when a patient has M_p > 1 target points.

5) Plotting

plot_hcp_intervals()
- mode = "band": for one subject with repeated measurements (e.g., over time), plots an interval band as a function of a covariate.
- mode = "pid": for multiple subjects with one target point each, displays one prediction interval per subject, optionally sorted by a covariate.

Quick examples

This section shows (i) how to call hcp_predict_targets() under the four common test-data scenarios, and (ii) how to generate and display two plots from plot_hcp_intervals().

A) Four scenarios for testing data (via `hcp_predict_targets()`)

Generate training data (fixed across all cases)

set.seed(1)
dat_train <- generate_clustered_mar(
  n = 200, m = 4, d = 1,
  x_dist = "uniform", x_params = list(min = 0, max = 10),
  target_missing = 0.30,
  seed = 1
)
y_grid <- seq(-6, 6, length.out = 201)

Case 1: one patient, one target

test_11 <- data.frame(pid = 1, X1 = 2.5)
out_11 <- hcp_predict_targets(
  dat = dat_train, test = test_11,
  x_cols = "X1", y_grid = y_grid,
  alpha = 0.1,
  seed = 1
)
out_11$pred

Case 2: one patient, multiple targets

test_1M <- data.frame(pid = 1, X1 = c(1, 3, 7, 9))
out_1M <- hcp_predict_targets(
  dat = dat_train, test = test_1M,
  x_cols = "X1", y_grid = y_grid,
  alpha = 0.1,
  seed = 1
)
out_1M$pred

Case 3: multiple patients, one target each

test_P1 <- data.frame(pid = 1:4, X1 = c(2, 4, 6, 8))
out_P1 <- hcp_predict_targets(
  dat = dat_train, test = test_P1,
  x_cols = "X1", y_grid = y_grid,
  alpha = 0.1,
  seed = 1
)
out_P1$pred

Case 4: multiple patients, multiple targets per patient

test_PM <- data.frame(
  pid = c(1,1, 2,2,2, 3,3),
  X1  = c(1,6,  2,5,9,  3,8)
)
out_PM <- hcp_predict_targets(
  dat = dat_train, test = test_PM,
  x_cols = "X1", y_grid = y_grid,
  alpha = 0.1,
  seed = 1
)
out_PM$pred

B) Two plots

Generate training data and test data

dat_train <- generate_clustered_mar(
  n = 200, m = 20, d = 1,
  x_dist = "uniform", x_params = list(min = 0, max = 10),
  hetero_gamma = 2.5,
  target_missing = 0.30,
  seed = 1
)
y_grid <- seq(-6, 10, length.out = 201)

# test data with latent truth
dat_test <- generate_clustered_mar(
  n = 100, m = 20, d = 1,
  x_dist = "uniform", x_params = list(min = 0, max = 10),
  hetero_gamma = 2.5,
  seed = 999
)

Plot A: one patient, multiple targets (band vs time)

pid <- dat_test$id[1]
idx <- which(dat_test$id == pid)
idx <- idx[order(dat_test$X1[idx])][1:10]
test_1M <- data.frame(pid = pid, X1 = dat_test$X1[idx], y_true = dat_test$Y_full[idx])

out_1M <- hcp_predict_targets(
  dat = dat_train, test = test_1M,
  x_cols = "X1", y_grid = y_grid,
  alpha = 0.1,
  S = 3, B = 3,
  seed = 1
)
plot_hcp_intervals(
  out_1M$pred, mode = "band", x_col = "X1",
  y_true_col = "y_true", show_true = TRUE,
  main = "Case A: one patient, multiple time points (band vs time)"
)

Plot B: multiple patients, one target each (intervals by patient)

pids <- unique(dat_test$id)[1:20]
test_P1 <- subset(dat_test, id %in% pids & j == 1, select = c(id, X1, Y_full))
names(test_P1) <- c("pid", "X1", "y_true")

out_P1 <- hcp_predict_targets(
  dat = dat_train, test = test_P1,
  x_cols = "X1", y_grid = y_grid,
  alpha = 0.1,
  S = 3, B = 3,
  seed = 1
)
plot_hcp_intervals(
  out_P1$pred, mode = "pid", pid_col = "pid", x_sort_col = "X1",
  y_true_col = "y_true", show_true = TRUE,
  main = "Case B: multiple patients, one time point (by patient)"
)

HCPclust

Installation

Method overview

What’s included

1) Data generator

2) Conditional density estimator

3) Missingness propensity model

4) HCP conformal prediction

5) Plotting

Quick examples

A) Four scenarios for testing data (via `hcp_predict_targets()`)

Generate training data (fixed across all cases)

Case 1: one patient, one target

Case 2: one patient, multiple targets

Case 3: multiple patients, one target each

Case 4: multiple patients, multiple targets per patient

B) Two plots

Generate training data and test data

Plot A: one patient, multiple targets (band vs time)

Plot B: multiple patients, one target each (intervals by patient)

Copy Link

Version

Install

Version

License

Issues

Pull Requests

Stars

Forks

Repository

Maintainer

Last Published

Functions in HCPclust (0.1.1)

HCPclust

Installation

Method overview

What’s included

1) Data generator

2) Conditional density estimator

3) Missingness propensity model

4) HCP conformal prediction

5) Plotting

Quick examples

A) Four scenarios for testing data (via hcp_predict_targets())

Generate training data (fixed across all cases)

Case 1: one patient, one target

Case 2: one patient, multiple targets

Case 3: multiple patients, one target each

Case 4: multiple patients, multiple targets per patient

B) Two plots

Generate training data and test data

Plot A: one patient, multiple targets (band vs time)

Plot B: multiple patients, one target each (intervals by patient)

Copy Link

Version

Install

Version

License

Issues

Pull Requests

Stars

Forks

Repository

Maintainer

Last Published

Functions in HCPclust (0.1.1)

A) Four scenarios for testing data (via `hcp_predict_targets()`)