fit_cond_density_quantile: Estimate conditional density pi(y|x) via quantile process + quotient estimator

Description

Fits a conditional quantile function $\widehat Q_Y(\tau\mid x)$ using pooled observed data (working-independence), and estimates the conditional density through the quotient estimator along the quantile curve: $$ \widehat\pi\{\widehat Q(\tau\mid x)\mid x\} = \frac{2h(\tau)}{\widehat Q(\tau+h(\tau)\mid x)-\widehat Q(\tau-h(\tau)\mid x)}. $$ For numerical stability, the quantile curve can be monotone-adjusted (isotonic regression), and tail decay extrapolation can be used before interpolation to $\pi(y\mid x)$.

Usage

fit_cond_density_quantile(
  dat,
  y_col = "Y",
  delta_col = "delta",
  x_cols,
  taus = seq(0.05, 0.95, by = 0.01),
  h = NULL,
  method = c("rq", "qrf"),
  enforce_monotone = TRUE,
  tail_decay = TRUE,
  num_extra_points = 10L,
  decay_factor = 0.8,
  dens_floor = 1e-10,
  eps = 1e-08,
  gap_min = 0.01,
  seed = NULL,
  ...
)

Value

A list containing fitted objects and prediction functions:

predict_Q(x_new, taus_use): Returns the estimated conditional quantiles $$ \widehat Q_Y(\tau \mid x) $$ for $\tau \in (0,1)$ specified by taus_use, evaluated at new covariate values x_new. The output is a numeric matrix with one row per covariate vector $x$ and one column per quantile level $\tau$.
predict_density(x_new, y_new): Returns the estimated conditional density $$ \widehat \pi(y \mid x), $$ evaluated at specified (x,y) pairs. The inputs x_new and y_new are paired row-wise, so that the r-th row of x_new is evaluated at y_new[r].

Arguments

dat

data.frame in long format, containing outcome, missingness indicator, and covariates.

y_col

name of outcome column (observed Y, may contain NA).

delta_col

name of missingness indicator (1 observed, 0 missing).

x_cols

character vector of covariate column names (include time if desired).

taus

grid of quantile levels in (0,1) at which the quantile process is evaluated.

h

Bandwidth(s) for quotient. Either a scalar or a numeric vector of length length(taus). If NULL, a tau-specific bandwidth vector $h(\tau)$ is computed via quantreg::bandwidth.rq, and automatically shrunk near the boundaries to ensure $\tau\pm h(\tau)\in(0,1)$.

method

quantile engine: "rq" (linear quantile regression) or "qrf" (quantile random forest).

enforce_monotone

logical; if TRUE, apply isotonic regression to the predicted quantile curve in $\tau$ for each $x$ to reduce quantile crossing.

tail_decay

logical; if TRUE, add extra tail points with geometric decay before interpolation.

num_extra_points

number of extra tail points on each side when tail_decay=TRUE.

decay_factor

decay factor in (0,1) for tail densities when tail_decay=TRUE.

dens_floor

lower bound for density to avoid numerical issues.

eps

small stabilizer for denominator pmax(Qplus-Qminus, eps).

gap_min

minimum spacing for tail extrapolation points.

seed

optional seed.

...

extra arguments passed to the underlying quantile engine:

rq: passed to quantreg::rq.fit, e.g. rq_method="br".

qrf

passed to quantregForest::quantregForest, e.g. ntree=500.

Examples

Run this code

## ------------------------------------------------------------
## Case A: Conditional density evaluated at a single point (x, y)
## ------------------------------------------------------------
## This illustrates the most basic usage: estimating pi(y | x)
## at one covariate value x and one response value y.

dat <- generate_clustered_mar(
  n = 200, m = 4, d = 2,
  target_missing = 0.3, seed = 1
)
fit <- fit_cond_density_quantile(
  dat,
  y_col = "Y", delta_col = "delta",
  x_cols = c("X1", "X2"),
  taus = seq(0.05, 0.95, by = 0.02),
  method = "rq",
  seed = 1
)
## a single covariate value x
x1 <- matrix(c(0.2, -1.0), nrow = 1)
colnames(x1) <- c("X1", "X2")
## estimate pi(y | x) at y = 0.5
fit$predict_density(x1, y_new = 0.5)


## ------------------------------------------------------------
## Case B: Conditional density as a function of y (density curve)
## ------------------------------------------------------------
## Here we fix x and evaluate pi(y | x) over a grid of y values,
## which produces an estimated conditional density curve.

y_grid <- seq(-3, 3, length.out = 201)
## reuse the same x by repeating it to match the y-grid
x_rep <- x1[rep(1, length(y_grid)), , drop = FALSE]
f_grid <- fit$predict_density(x_rep, y_grid)

## ------------------------------------------------------------
## True conditional density under the data generator
## ------------------------------------------------------------
## Data are generated as:
##   Y = X^T beta + b + eps,
##   b ~ N(0, sigma_b^2),  eps ~ N(0, sigma_eps^2)
## Hence the marginal conditional density is:
##   Y | X = x ~ N(x^T beta, sigma_b^2 + sigma_eps^2)

beta_true <- c(0.5, 0.6)
sigma_b_true <- 0.7
sigma_eps_true <- 1.0
mu_true <- drop(x1 %*% beta_true)
sd_true <- sqrt(sigma_b_true^2 + sigma_eps_true^2)
f_true <- stats::dnorm(y_grid, mean = mu_true, sd = sd_true)


## ------------------------------------------------------------
## Visualization: estimated vs true conditional density
## (use smooth.spline on log-density for a smoother display)
## ------------------------------------------------------------

## smooth the estimated curve for visualization
ok <- is.finite(f_grid) & (f_grid > 0)
sp <- stats::smooth.spline(y_grid[ok], log(f_grid[ok]), spar = 0.85)
f_smooth <- exp(stats::predict(sp, y_grid)$y)

ymax <- max(c(f_smooth, f_true), na.rm = TRUE)
plot(
  y_grid, f_smooth,
  type = "l", lwd = 2,
  xlab = "y",
  ylab = expression(hat(pi)(y ~ "|" ~ x)),
  ylim = c(0, 1.2 * ymax),
  main = "Conditional density at a fixed x: estimated vs true"
)
grid(col = "gray85", lty = 1)
lines(y_grid, f_true, lwd = 2, lty = 2)
legend(
  "topright",
  legend = c("Estimated (smoothed)", "True (generator)"),
  lty = c(1, 2), lwd = c(2, 2), bty = "n"
)

Run the code above in your browser using DataLab