kroclearn: Fit a kernel model

Description

Fit a kernel model

Usage

kroclearn(
  X,
  y,
  lambda,
  kernel = "radial",
  param.kernel = NULL,
  loss = "hinge",
  approx = NULL,
  intercept = TRUE,
  target.perf = list(),
  param.convergence = list()
)

Value

An object of class "kroclearn", a list containing:

theta.hat — estimated dual coefficient vector.
intercept — fitted intercept (if applicable).
lambda, kernel, param.kernel, loss.
approx, B (number of sampled pairs if approximation used).
time — training time (seconds).
nobs, p — number of observations and predictors.
converged, n.iter — convergence information.
kfunc — kernel function object.
nystrom — low rank kernel approximation details (if used).
X — training data (post-preprocessing).
preprocessing — details on categorical variables, removed columns, and column names.
call — the function call.

Arguments

X

Predictor matrix or data.frame (categorical variables are automatically one-hot encoded).

y

Response vector with class labels in {-1, 1}. Labels given as {0, 1} or as a two-level factor/character are automatically converted to this format.

lambda

Positive scalar regularization parameter.

kernel

Kernel type: "radial" (default), "polynomial", "linear", or "laplace".

param.kernel

Kernel-specific parameter:

\(\sigma\) for "radial" and "laplace" kernels (default \(1/p\), where \(p\) is the number of predictors after preprocessing, i.e., after categorical variables are one-hot encoded).
Degree for "polynomial" kernel (default 2).
Ignored for "linear" kernel.

loss

Surrogate loss function type. One of: "hinge" (default), "hinge2" (squared hinge), "logistic", or "exponential".

approx

Logical; enables a scalable approximation to accelerate training. The default is TRUE when nrow(X) >= 1000, and FALSE otherwise. For details about how approximation is applied, see the details section.

intercept

Logical; include an intercept in the model (default TRUE).

target.perf

List with target sensitivity and specificity used when estimating the intercept (defaults to 0.9 each).

param.convergence

List of convergence controls (e.g., maxiter, eps). Default is list(maxiter = 5e4, eps = 1e-4).

Details

For large-scale data, the model is computationally prohibitive because its loss is a U-statistic involving a double summation. To reduce this burden, the package adopts an efficient algorithm based on an incomplete U-statistic, which approximates the loss with a single summation. In kernel models, a Nyström low-rank approximation is further applied to efficiently compute the kernel matrix. These approximations substantially reduce computational cost and accelerate training, while maintaining accuracy, making the model feasible for large-scale datasets. This option is available when @param approx = TRUE.

Examples

Run this code

set.seed(123)
n <- 100
r <- sqrt(runif(n, 0.05, 1))
theta <- runif(n, 0, 2*pi)
X <- cbind(r * cos(theta), r * sin(theta))
y <- ifelse(r < 0.5, 1, -1)

fit <- kroclearn(X, y, lambda = 0.1, kernel = "radial", approx=TRUE)

Run the code above in your browser using DataLab