Learn R Programming

QuanDA (version 1.0.0)

quanda: Fit QuanDA for imbalanced binary classification

Description

QuanDA fits a quantile-regression-based discriminant with label jittering. For each candidate quantile level \(\tau\), the binary labels are jittered (adding \(U(0,1)\)), a penalized quantile regression is fit multiple times, and the coefficient vectors are averaged. The best \(\tau\) is selected by AUC.

Usage

quanda(
  x,
  y,
  lambda = 10^(seq(1, -4, length.out = 30)),
  lam2 = 0.01,
  n_rep = 10,
  tau_window = 0.05,
  nfolds = 5,
  maxit = 10000,
  eps = 1e-07,
  maxit_cv = 10000,
  eps_cv = 1e-05
)

Value

An object of class "quanda" with elements:

beta

Numeric vector of length \(p+1\) (intercept first).

tau_grid

Numeric vector of candidate \(\tau\) values.

tau_best

Chosen \(\tau\).

auc

Vector of AUCs across \(\tau\).

call

The matched call.

Arguments

x

A numeric matrix of predictors with \(n\) rows (observations) and \(p\) columns (features).

y

A binary response vector of length \(n\) with values 0 or 1.

lambda

Optional numeric vector of penalty values (largest lambda[1]). If NULL, a default sequence will be generated from the data.

lam2

Numeric, secondary penalty (ridge/elastic term) passed to hdqr. Default 0.01.

n_rep

Integer, number of jittering repetitions (averaged). Default 10.

tau_window

Width around the class rate to explore quantiles. Candidate \(\tau\) are \(b + \{-w,\ldots,w\}\) in steps of 0.01, clipped to \([0,1]\), where \(b\) is the class rate and \(w\) is tau_window. Default 0.1.

nfolds

Integer, number of CV folds used by cv_z(). Default 5.

maxit, maxit_cv, eps, eps_cv

Controls for inner optimizers and CV helper.

Details

We jitter labels via \(z_i = y_i + U_i\), where \(U_i \sim \mathrm{Unif}(0,1)\), fit penalized quantile regression at multiple \(\tau\), average coefficients over n_rep jitters, compute AUCs on the original \((x,y)\), and pick the \(\tau\) that maximizes AUC.

Examples

Run this code
data(breast)
X <- as.matrix(X)
y <- as.numeric(as.character(y))
y[y==-1]=0
fit <- quanda(X, y)
pred <- predict(fit, tail(X))

Run the code above in your browser using DataLab