fastkrr: Fit kernel ridge regression using exact or approximate methods

Description

This function performs kernel ridge regression (KRR) in high-dimensional settings. The regularization parameter $\lambda$ can be selected via the CVST (Cross-Validation via Sequential Testing) procedure. For scalability, three different kernel approximation strategies are supported (Nyström approximation, Pivoted Cholesky decomposition, Random Fourier Features(RFF)), and kernel matrix can be computed using two methods(Gaussian kernel, Laplace kerenl).

Usage

fastkrr(
  x,
  y,
  kernel = "gaussian",
  opt = "exact",
  m = NULL,
  eps = 1e-06,
  rho = 1,
  lambda = NULL,
  fastcv = FALSE,
  n_threads = 4,
  verbose = TRUE
)

Value

An S3 object of class "fastkrr", which is a list containing the results of the fitted Kernel Ridge Regression model.

coefficients: Estimated coefficient vector $\mathbb{R}^{n}$. Accessible via model$coefficients.
fitted.values: Fitted values $\mathbb{R}^{n}$. Accessible via model$fitted.values.
opt: Kernel approximation option. One of "exact", "pivoted", "nystrom", "rff".
kernel: Kernel used ("gaussian" or "laplace").
x: Input design matrix.
y: Response vector.
lambda: Regularization parameter. If NULL, tuned by cross-validation via CVST.
rho: Additional user-specified hyperparameter.
n_threads: Number of threads used for parallelization.

Additional components depend on the value of opt:

opt = “exact”

K: The full kernel matrix.

opt = “nystrom”

K: Exact kernel matrix $K \in \mathbb{R}^{n \times n}$.
m: Kernel approximation degree.
R: The method provides a low-rank approximation to the kernel matrix $R \in \mathbb{R}^{n \times m}$ obtained via Nyström approximation; satisfies $K \approx R R^\top$.

opt = “pivoted”

K: Exact kernel matrix $K \in \mathbb{R}^{n \times n}$.
m: Kernel pproximation degree.
PR: The method provides a low-rank approximation to the kernel matrix $PR \in \mathbb{R}^{ n \times m}$ obtained via Pivoted Cholesky decomposition; satisfies $K \approx PR\,(PR)^\top$.
eps: Numerical tolerance used for early stopping in the pivoted Cholesky decomposition.

opt = “rff”

m: Number of random features.
Z: Random Fourier Feature matrix $Z \in \mathbb{R}^{n \times m}$ with $Z_{ij} = z_j(x_i) = \sqrt{2/m}\cos(\omega_j^\top x_i + b_j), \quad j = 1, \cdots, m,$ so that $K \approx Z Z^\top$.
W: Random frequency matrix $\omega \in \mathbb{R}^{m \times d}$ (row $j$ is $\omega_j^\top \in \mathbb{R}^d$), drawn i.i.d. from the spectral density of the chosen kernel:
- Gaussian: $\omega_{jk} \sim \mathcal{N}(0, 2\gamma)$ (e.g., $\gamma=1/\ell^2$).
- Laplace: $\omega_{jk} \sim \mathrm{Cauchy}(0, 1/\sigma)$ i.i.d.
b Random phase vector $b \in \mathbb{R}^m$, i.i.d. $\mathrm{Unif}[0,\,2\pi]$.

Arguments

x

Design matrix $X \in \mathbb{R}^{n\times d}$.

y

Response variable $y \in \mathbb{R}^{n}$.

kernel

Kernel type either "gaussian"or "laplace".

opt

Method for constructing or approximating :

"exact": Construct the full kernel matrix $K \in \mathbb{R}^{n\times n}$ using design martix $X$.

"nystrom"

Construct a low-rank approximation of the kernel matrix $K \in \mathbb{R}^{n \times n}$ using the Nyström approximation.

"pivoted"

Construct a low-rank approximation of the kernel matrix $K \in \mathbb{R}^{n \times n}$ using Pivoted Cholesky decomposition.

"rff"

Use Random Fourier Features to construct a feature map $Z \in \mathbb{R}^{n \times m}$ (with $m$ random features) so that $K \approx Z Z^\top$. Here, $m$ is the number of features.

Approximation rank(number of random features) used for the low-rank kernel approximation. If not provided by the user, it defaults to $$\lceil n \cdot \frac{\log(d + 5)}{10} \rceil,$$ where $n = nrow(X)$ and $d = ncol(X)$.

eps

Tolerance parameter used only in "pivoted" for stopping criterion of the Pivoted Cholesky decomposition.

rho

Scaling parameter of the kernel($\rho$), specified by the user. Defaults to 1. $$\text{Gaussian kernel : } \mathcal{K}(x, x') = \exp(-\rho \| x - x'\|^2_2)$$ $$\text{Laplace kernel : } \mathcal{K}(x, x') = \exp(-\rho \| x - x'\|_1)$$

lambda

Regularization parameter. If NULL, the penalty parameter is chosen automatically via CVST package. If not provided, the argument is set to a kernel-specific grid of 100 values: $[10^{-10}, 10^{-3}]$ for Gaussian, $[10^{-5}, 10^{-2}]$ for Laplace.

fastcv

If TRUE, accelerated cross-validation is performed via sequential testing (early stopping) as implemented in the CVST package. The default is FALSE.

n_threads

Number of parallel threads. The default is 4. If the system does not support 4 threads, it automatically falls back to 1 thread. Parallelization (implemented in C++) is one of the main advantages of this package and is applied only for opt = "nystrom" or opt = "rff", and for the Laplace kernel (kernel = "laplace").

verbose

If TRUE, detailed progress and cross-validation results are printed to the console. If FALSE, suppresses intermediate output and only returns the final result.

Details

The function performs several input checks and automatic adjustments:

If x is a vector, it is converted to a one column matrix. Otherwise, x must be a matrix; otherwise an error is thrown.
y must be a vector, and its length must match nrow(x).
kernel must be either gaussian or laplace.
opt must be one of "exact", "pivoted", "nystrom", or "rff".
If m is NULL, it defaults to $$\lceil n \cdot \log(d + 5) / 10 \rceil$$ where $n = nrow(X)$ and $d = ncol(X)$. Otherwise, m must be a positive integer.
rho must be a positive real number (default is 1).
lambda can be specified in three ways:
1. A positive numeric scalar, in which case the model is fitted with this single value.
2. A numeric vector (length >= 3) of positive values used as a tuning grid; selection is performed by CVST cross-validation (sequential testing if fastcv = TRUE).
3. NULL: use a default grid (internal setting) and tune lambda via CVST cross-validation (sequential testing if fastcv = TRUE).
n_threads: Number of threads for parallel computation. Default is 4. If the system has <= 3 available processors, it uses 1.

Examples

Run this code

# Data setting
set.seed(1)
lambda = 1e-4
d = 1
rho = 1
n = 50
X = matrix(runif(n*d, 0, 1), nrow = n, ncol = d)
y = as.vector(sin(2*pi*rowMeans(X)^3) + rnorm(n, 0, 0.1))

# Exapmle: pivoted cholesky
model = fastkrr(X, y, kernel = "gaussian", opt = "pivoted", rho = rho, lambda = 1e-4)

# Example: nystrom
model = fastkrr(X, y, kernel = "gaussian", opt = "nystrom", rho = rho, lambda = 1e-4)

# Example: random fourier features
model = fastkrr(X, y, kernel = "gaussian", opt = "rff", rho = rho, lambda = 1e-4)

# Example: Laplace kernel
model = fastkrr(X, y, kernel = "laplace", opt = "nystrom", n_threads = 1, rho = rho)

Run the code above in your browser using DataLab