hcp_predict_targets: HCP prediction wrapper for multiple measurements with optional per-patient Bonferroni

Description

Wraps hcp_conformal_region to produce conformal prediction regions for a collection of measurements, possibly including multiple measurements per individual.

Based on the structure of the test dataset, the prediction mode is determined automatically as follows, where \(P\) denotes the number of patients (clusters) and \(M\) denotes the number of measurements per patient:

\(P=1,\, M=1\): Predict a single patient with a single measurement.
\(P=1,\, M>1\): Predict a single patient with multiple measurements (e.g., repeated or longitudinal measurements for the same patient). If per-patient simultaneous prediction is desired, optional per-patient Bonferroni calibration can be applied.
\(P>1,\, M=1\): Predict multiple patients, each with a single measurement. Predictions are performed independently at the nominal level \(\alpha\), without Bonferroni calibration.
\(P>1,\, M>1\): Predict multiple patients, each with multiple measurements. When per-patient simultaneous coverage is desired, a Bonferroni correction can be applied by using an effective level \(\alpha / M_p\) for each measurement, yielding Bonferroni-adjusted marginal prediction regions for patient \(p\).

Usage

hcp_predict_targets(
  dat,
  test,
  pid_col = "pid",
  x_cols,
  y_grid,
  alpha = 0.1,
  bonferroni = FALSE,
  return_region = FALSE,
  id_col = "id",
  y_col = "Y",
  delta_col = "delta",
  ...
)

Value

A list with:

pred: A data.frame in the same row order as test. It contains all columns of test plus the effective level alpha_eff and the prediction-band endpoints lo and hi for each measurement.
region: If return_region=TRUE, a list of length nrow(test) where each element is the subset of y_grid retained in the prediction region for the corresponding test row; otherwise NULL.
meta: A list with summary information, including the number of patients P, the per-patient measurement counts M_by_pid, and the settings alpha and bonferroni.

Arguments

dat

Training/calibration data passed to hcp_conformal_region.

test

A data.frame of test measurements, where each row corresponds to a single measurement. The test data must follow one of the four clustered settings \(P=1, M=1\), \(P=1, M>1\), \(P>1, M=1\), or \(P>1, M>1\), where \(P\) is the number of patients (clusters) and \(M\) is the number of measurements per patient.

The data.frame must include a patient identifier specified by pid_col and all covariate columns listed in x_cols. Repeated values of pid_col indicate multiple measurements (e.g., repeated or longitudinal measurements) for the same patient.

pid_col

Column in test giving the patient (cluster/subject) identifier. Default "pid".

x_cols

Covariate column names (e.g., c("X1")).

y_grid

Candidate y-grid passed to hcp_conformal_region.

alpha

Nominal miscoverage level in (0,1) passed to hcp_conformal_region.

bonferroni

Logical; if TRUE, apply per-patient Bonferroni only when a patient has multiple test measurements (i.e., \(M_p>1\)). If FALSE, always use level \(\alpha\).

return_region

Logical; if TRUE, return the full region (subset of y_grid) for each row.

id_col, y_col, delta_col

Column names in dat for patient ID, outcome, and missingness indicator.

...

Additional arguments forwarded to hcp_conformal_region (e.g., S, B, combine_B, combine_S, dens_method, prop_method, seed).

Examples

Run this code

## ------------------------------------------------------------
## Examples illustrating the four test-data settings:
## (P=1, M=1), (P=1, M>1), (P>1, M=1), and (P>1, M>1)
## ------------------------------------------------------------
set.seed(1)

## training data (fixed across all cases)
dat_train <- generate_clustered_mar(
  n = 200, m = 4, d = 1,
  x_dist = "uniform", x_params = list(min = 0, max = 10),
  target_missing = 0.30,
  seed = 1
)

y_grid <- seq(-6, 6, length.out = 201)

## Case 1: P=1, M=1  (one patient, one measurement)
test_11 <- data.frame(
  pid = 1,
  X1  = 2.5
)
out_11 <- hcp_predict_targets(
  dat = dat_train,
  test = test_11,
  x_cols = "X1",
  y_grid = y_grid,
  alpha = 0.1,
  S = 2, B = 2,
  seed = 1
)
out_11$pred

## Case 2: P=1, M>1  (one patient, multiple measurements)
test_1M <- data.frame(
  pid = 1,
  X1  = c(1, 3, 7, 9)
)
out_1M <- hcp_predict_targets(
  dat = dat_train,
  test = test_1M,
  x_cols = "X1",
  y_grid = y_grid,
  alpha = 0.1,
  S = 2, B = 2,
  seed = 1
)
out_1M$pred

## Case 3: P>1, M=1  (multiple patients, one measurement each)
test_P1 <- data.frame(
  pid = 1:4,
  X1  = c(2, 4, 6, 8)
)
out_P1 <- hcp_predict_targets(
  dat = dat_train,
  test = test_P1,
  x_cols = "X1",
  y_grid = y_grid,
  alpha = 0.1,
  S = 2, B = 2,
  seed = 1
)
out_P1$pred

## Case 4: P>1, M>1  (multiple patients, multiple measurements per patient)
test_PM <- data.frame(
  pid = c(1,1, 2,2,2, 3,3),
  X1  = c(1,6,  2,5,9,  3,8)
)
out_PM <- hcp_predict_targets(
  dat = dat_train,
  test = test_PM,
  x_cols = "X1",
  y_grid = y_grid,
  alpha = 0.1,
  S = 2, B = 2,
  seed = 1
)
out_PM$pred

Run the code above in your browser using DataLab