Constructs a marginal conformal prediction region for a new covariate value \(x_{n+1}\) under clustered data with missing outcomes, following the HCP framework:
(1) Model fitting.
Fit a pooled conditional density model \(\widehat\pi(y\mid x)\) using
fit_cond_density_quantile, together with a marginal missingness
propensity model \(\widehat p(x)=\mathbb{P}(\delta=1\mid x)\) using
fit_missingness_propensity, both estimated on a subject-level
training split.
(2) Subsampled calibration. Repeatedly construct calibration sets by randomly drawing one observation per subject from the calibration split.
(3) Weighted conformal scoring. Compute weighted conformal \(p\)-values over a candidate grid using the nonconformity score \(R(x,y)=-\widehat\pi(y\mid x)\) and inverse-propensity weights \(w(x)=1/\widehat p(x)\) under a MAR assumption.
(4) Aggregation. Aggregate dependent \(p\)-values across subsamples (B) and data splits (S) using either the Cauchy combination test (CCT/ACAT) or the arithmetic mean.
The prediction region is returned as a subset of the supplied grid: $$\widehat C(x_{n+1};\alpha)=\{y\in\mathcal Y:\ p_{\text{final}}(y)>\alpha\}.$$
hcp_conformal_region(
dat,
id_col,
y_col = "Y",
delta_col = "delta",
x_cols,
x_test,
y_grid,
alpha = 0.1,
train_frac = 0.5,
S = 5,
B = 5,
combine_B = c("cct", "mean"),
combine_S = c("cct", "mean"),
seed = NULL,
return_details = FALSE,
dens_method = c("rq", "qrf"),
dens_taus = seq(0.05, 0.95, by = 0.02),
dens_h = NULL,
enforce_monotone = TRUE,
tail_decay = TRUE,
prop_method = c("logistic", "grf", "boosting"),
prop_eps = 1e-06,
...
)If return_details=FALSE (default), a list with:
regionLength-K list; region[[k]] is the subset of y_grid with p_final[k, ] > alpha.
lo_hiK x 2 matrix with columns c("lo","hi") giving min/max of region[[k]] (NA if empty).
p_finalK x length(y_grid) matrix of final p-values on y_grid.
y_gridThe candidate grid used.
If return_details=TRUE, also includes:
p_splitAn array with dimensions c(S, K, length(y_grid)) of split-level p-values.
split_metaTrain subject IDs for each split.
A data.frame containing clustered observations. Must include id_col, y_col,
delta_col, and all columns in x_cols.
Subject/cluster identifier column name.
Outcome column name.
Missingness indicator column name (1 observed, 0 missing).
Covariate column names used for both density estimation and missingness propensity.
New covariate value(s). A numeric vector (treated as one row),
or a numeric matrix/data.frame with nrow(x_test)=K test points and
ncol(x_test)=length(x_cols) covariates.
Numeric vector of candidate \(y\) values at which to evaluate conformal \(p\)-values.
Miscoverage level in (0,1). Region keeps \(y\) with \(p(y)>\alpha\).
Fraction of subjects assigned to training in each split.
Number of independent subject-level splits.
Number of subsamples per split (one observation per subject per subsample).
Combine \(p\)-values across B subsamples: "cct" (default) or "mean".
Combine \(p\)-values across S splits: "cct" (default) or "mean".
Optional seed for reproducibility.
Logical; if TRUE, also return split-level p-values and split metadata.
Density/quantile engine for fit_cond_density_quantile: "rq" or "qrf".
Quantile grid passed to fit_cond_density_quantile.
Bandwidth(s) passed to fit_cond_density_quantile.
Passed to fit_cond_density_quantile.
Passed to fit_cond_density_quantile.
Missingness propensity method for fit_missingness_propensity:
"logistic", "grf", or "boosting".
Clipping level for propensity predictions used by fit_missingness_propensity.
Extra arguments passed to fit_missingness_propensity.
dat <- generate_clustered_mar(n = 200, m = 4, d = 2, target_missing = 0.30, seed = 1)
y_grid <- seq(-4, 4, length.out = 200)
x_test <- matrix(c(0.2, -1.0), nrow = 1); colnames(x_test) <- c("X1", "X2")
res <- hcp_conformal_region(
dat, id_col = "id",
y_col = "Y", delta_col = "delta",
x_cols = c("X1", "X2"),
x_test = x_test,
y_grid = y_grid,
alpha = 0.1,
S = 2, B = 2,
seed = 1
)
## interval endpoints on the y-grid (outer envelope)
c(lo = min(res$region[[1]]), hi = max(res$region[[1]]))
Run the code above in your browser using DataLab