Learn R Programming

HCPclust (version 0.1.1)

hcp_conformal_region: HCP conformal prediction region with repeated subsampling and repeated data splitting

Description

Constructs a marginal conformal prediction region for a new covariate value \(x_{n+1}\) under clustered data with missing outcomes, following the HCP framework:

  • (1) Model fitting. Fit a pooled conditional density model \(\widehat\pi(y\mid x)\) using fit_cond_density_quantile, together with a marginal missingness propensity model \(\widehat p(x)=\mathbb{P}(\delta=1\mid x)\) using fit_missingness_propensity, both estimated on a subject-level training split.

  • (2) Subsampled calibration. Repeatedly construct calibration sets by randomly drawing one observation per subject from the calibration split.

  • (3) Weighted conformal scoring. Compute weighted conformal \(p\)-values over a candidate grid using the nonconformity score \(R(x,y)=-\widehat\pi(y\mid x)\) and inverse-propensity weights \(w(x)=1/\widehat p(x)\) under a MAR assumption.

  • (4) Aggregation. Aggregate dependent \(p\)-values across subsamples (B) and data splits (S) using either the Cauchy combination test (CCT/ACAT) or the arithmetic mean.

The prediction region is returned as a subset of the supplied grid: $$\widehat C(x_{n+1};\alpha)=\{y\in\mathcal Y:\ p_{\text{final}}(y)>\alpha\}.$$

Usage

hcp_conformal_region(
  dat,
  id_col,
  y_col = "Y",
  delta_col = "delta",
  x_cols,
  x_test,
  y_grid,
  alpha = 0.1,
  train_frac = 0.5,
  S = 5,
  B = 5,
  combine_B = c("cct", "mean"),
  combine_S = c("cct", "mean"),
  seed = NULL,
  return_details = FALSE,
  dens_method = c("rq", "qrf"),
  dens_taus = seq(0.05, 0.95, by = 0.02),
  dens_h = NULL,
  enforce_monotone = TRUE,
  tail_decay = TRUE,
  prop_method = c("logistic", "grf", "boosting"),
  prop_eps = 1e-06,
  ...
)

Value

If return_details=FALSE (default), a list with:

region

Length-K list; region[[k]] is the subset of y_grid with p_final[k, ] > alpha.

lo_hi

K x 2 matrix with columns c("lo","hi") giving min/max of region[[k]] (NA if empty).

p_final

K x length(y_grid) matrix of final p-values on y_grid.

y_grid

The candidate grid used.

If return_details=TRUE, also includes:

p_split

An array with dimensions c(S, K, length(y_grid)) of split-level p-values.

split_meta

Train subject IDs for each split.

Arguments

dat

A data.frame containing clustered observations. Must include id_col, y_col, delta_col, and all columns in x_cols.

id_col

Subject/cluster identifier column name.

y_col

Outcome column name.

delta_col

Missingness indicator column name (1 observed, 0 missing).

x_cols

Covariate column names used for both density estimation and missingness propensity.

x_test

New covariate value(s). A numeric vector (treated as one row), or a numeric matrix/data.frame with nrow(x_test)=K test points and ncol(x_test)=length(x_cols) covariates.

y_grid

Numeric vector of candidate \(y\) values at which to evaluate conformal \(p\)-values.

alpha

Miscoverage level in (0,1). Region keeps \(y\) with \(p(y)>\alpha\).

train_frac

Fraction of subjects assigned to training in each split.

S

Number of independent subject-level splits.

B

Number of subsamples per split (one observation per subject per subsample).

combine_B

Combine \(p\)-values across B subsamples: "cct" (default) or "mean".

combine_S

Combine \(p\)-values across S splits: "cct" (default) or "mean".

seed

Optional seed for reproducibility.

return_details

Logical; if TRUE, also return split-level p-values and split metadata.

dens_method

Density/quantile engine for fit_cond_density_quantile: "rq" or "qrf".

dens_taus

Quantile grid passed to fit_cond_density_quantile.

dens_h

Bandwidth(s) passed to fit_cond_density_quantile.

enforce_monotone

Passed to fit_cond_density_quantile.

tail_decay

Passed to fit_cond_density_quantile.

prop_method

Missingness propensity method for fit_missingness_propensity: "logistic", "grf", or "boosting".

prop_eps

Clipping level for propensity predictions used by fit_missingness_propensity.

...

Extra arguments passed to fit_missingness_propensity.

Examples

Run this code
dat <- generate_clustered_mar(n = 200, m = 4, d = 2, target_missing = 0.30, seed = 1)
y_grid <- seq(-4, 4, length.out = 200)
x_test <- matrix(c(0.2, -1.0), nrow = 1); colnames(x_test) <- c("X1", "X2")

res <- hcp_conformal_region(
  dat, id_col = "id",
  y_col = "Y", delta_col = "delta",
  x_cols = c("X1", "X2"),
  x_test = x_test,
  y_grid = y_grid,
  alpha = 0.1,
  S = 2, B = 2,
  seed = 1
)

## interval endpoints on the y-grid (outer envelope)
c(lo = min(res$region[[1]]), hi = max(res$region[[1]]))

Run the code above in your browser using DataLab