quanda: Fit QuanDA for imbalanced binary classification

Description

QuanDA fits a quantile-regression-based discriminant with label jittering. For each candidate quantile level \(\tau\), the binary labels are jittered (adding \(U(0,1)\)), a penalized quantile regression is fit multiple times, and the coefficient vectors are averaged. The best \(\tau\) is selected by AUC.

Usage

quanda(
  x,
  y,
  lambda = 10^(seq(1, -4, length.out = 30)),
  lam2 = 0.01,
  n_rep = 10,
  tau_window = 0.05,
  nfolds = 5,
  maxit = 10000,
  eps = 1e-07,
  maxit_cv = 10000,
  eps_cv = 1e-05
)

Value

An object of class "quanda" with elements:

beta: Numeric vector of length \(p+1\) (intercept first).
tau_grid: Numeric vector of candidate \(\tau\) values.
tau_best: Chosen \(\tau\).
auc: Vector of AUCs across \(\tau\).
call: The matched call.

Arguments

x: A numeric matrix of predictors with \(n\) rows (observations) and \(p\) columns (features).
y: A binary response vector of length \(n\) with values 0 or 1.
lambda: Optional numeric vector of penalty values (largest lambda[1]). If NULL, a default sequence will be generated from the data.
lam2: Numeric, secondary penalty (ridge/elastic term) passed to hdqr. Default 0.01.
n_rep: Integer, number of jittering repetitions (averaged). Default 10.
tau_window: Width around the class rate to explore quantiles. Candidate \(\tau\) are \(b + \{-w,\ldots,w\}\) in steps of 0.01, clipped to \([0,1]\), where \(b\) is the class rate and \(w\) is tau_window. Default 0.1.
nfolds: Integer, number of CV folds used by cv_z(). Default 5.
maxit, maxit_cv, eps, eps_cv: Controls for inner optimizers and CV helper.

Details

We jitter labels via \(z_i = y_i + U_i\), where \(U_i \sim \mathrm{Unif}(0,1)\), fit penalized quantile regression at multiple \(\tau\), average coefficients over n_rep jitters, compute AUCs on the original \((x,y)\), and pick the \(\tau\) that maximizes AUC.

Examples

Run this code

data(breast)
X <- as.matrix(X)
y <- as.numeric(as.character(y))
y[y==-1]=0
fit <- quanda(X, y)
pred <- predict(fit, tail(X))

Run the code above in your browser using DataLab