cv.roclearn: Cross-validation for linear models

Description

Perform k-fold cross-validation over a sequence of \(\lambda\) values and select the optimal model based on AUC.

Usage

cv.roclearn(
  X,
  y,
  lambda.vec = NULL,
  lambda.length = 30,
  penalty = "ridge",
  param.penalty = NULL,
  loss = "hinge",
  approx = NULL,
  intercept = TRUE,
  nfolds = 10,
  target.perf = list(),
  param.convergence = list()
)

Value

An object of class "cv.roclearn" with:

optimal.lambda — selected \(\lambda\).
optimal.fit — model refit on the full data at optimal.lambda.
lambda.vec — grid of penalty values considered.
auc.mean, auc.sd — mean and sd of cross-validated AUC.
auc.result — fold-by-lambda AUC matrix.
time.mean, time.sd — mean and sd of training time.
time.result — fold-by-lambda training time matrix.
nfolds, loss, penalty — settings.

Arguments

X

Predictor matrix or data.frame (categorical variables are automatically one-hot encoded).

y

Response vector with class labels in {-1, 1}. Labels given as {0, 1} or as a two-level factor/character are automatically converted to this format.

lambda.vec

Optional numeric vector of regularization parameters (lambda values). If NULL (default), a decreasing sequence is generated automatically.

lambda.length

Number of \(\lambda\) values to generate if lambda.vec is NULL. Default is 30.

penalty

Regularization penalty type: "ridge" (default), "lasso", "elastic", "alasso", "scad", or "mcp".

param.penalty

Penalty-specific parameter:

Ignored for "ridge" and "lasso".
Mixing parameter \(\alpha \in (0,1)\) for "elastic". Default is 0.5.
Adaptive weight exponent \(\gamma > 0\) for "alasso". Default is 1.
Tuning parameter (default 3.7) for "scad" and "mcp".

loss

Surrogate loss function type. One of: "hinge" (default), "hinge2" (squared hinge), "logistic", or "exponential".

approx

Logical; enables a scalable approximation to accelerate training. The default is TRUE when nrow(X) >= 1000, and FALSE otherwise. For details about how approximation is applied, see the details section of the roclearn function.

intercept

Logical; include an intercept in the model (default TRUE).

nfolds

Number of cross-validation folds (default 10).

target.perf

List with target sensitivity and specificity used when estimating the intercept (defaults to 0.9 each).

param.convergence

List of convergence controls (e.g., maxiter, eps). Default is list(maxiter = 5e4, eps = 1e-4).

Examples

Run this code

set.seed(123)
n <- 100
n_pos <- round(0.2 * n)
n_neg <- n - n_pos

X <- rbind(
  matrix(rnorm(2 * n_neg, mean = -1), ncol = 2),
  matrix(rnorm(2 * n_pos, mean =  1), ncol = 2)
)
y <- c(rep(-1, n_neg), rep(1, n_pos))

cvfit <- cv.roclearn(
  X, y,
  lambda.vec = exp(seq(log(0.01), log(5), length.out = 3)),
  approx=TRUE, nfolds = 2
)
cvfit$optimal.lambda

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples