This function performs kernel ridge regression (KRR) in high-dimensional settings. The regularization parameter \(\lambda\) can be selected via the CVST (Cross-Validation via Sequential Testing) procedure. For scalability, three different kernel approximation strategies are supported (Nyström approximation, Pivoted Cholesky decomposition, Random Fourier Features(RFF)), and kernel matrix can be computed using two methods(Gaussian kernel, Laplace kerenl).
fastkrr(
x,
y,
kernel = "gaussian",
opt = "exact",
m = NULL,
eps = 1e-06,
rho = 1,
lambda = NULL,
fastcv = FALSE,
n_threads = 4,
verbose = TRUE
)An S3 object of class "fastkrr", which is a list containing the
results of the fitted Kernel Ridge Regression model.
coefficients: Estimated coefficient vector \(\mathbb{R}^{n}\). Accessible via model$coefficients.
fitted.values: Fitted values \(\mathbb{R}^{n}\). Accessible via model$fitted.values.
opt: Kernel approximation option. One of "exact", "pivoted", "nystrom", "rff".
kernel: Kernel used ("gaussian" or "laplace").
x: Input design matrix.
y: Response vector.
lambda: Regularization parameter. If NULL, tuned by cross-validation via CVST.
rho: Additional user-specified hyperparameter.
n_threads: Number of threads used for parallelization.
Additional components depend on the value of opt:
K: The full kernel matrix.
K: Exact kernel matrix \(K \in \mathbb{R}^{n \times n}\).
m: Kernel approximation degree.
R: The method provides a low-rank approximation to the kernel matrix
\(R \in \mathbb{R}^{n \times m}\) obtained via
Nyström approximation; satisfies \(K \approx R R^\top\).
K: Exact kernel matrix \(K \in \mathbb{R}^{n \times n}\).
m: Kernel pproximation degree.
PR: The method provides a low-rank approximation to the kernel matrix
\(PR \in \mathbb{R}^{ n \times m}\) obtained via
Pivoted Cholesky decomposition; satisfies \(K \approx PR\,(PR)^\top\).
eps: Numerical tolerance used for early stopping in the pivoted Cholesky decomposition.
m: Number of random features.
Z: Random Fourier Feature matrix \(Z \in \mathbb{R}^{n \times m}\) with
\(Z_{ij} = z_j(x_i) = \sqrt{2/m}\cos(\omega_j^\top x_i + b_j), \quad j = 1, \cdots, m,\)
so that \(K \approx Z Z^\top\).
W: Random frequency matrix \(\omega \in \mathbb{R}^{m \times d}\)
(row \(j\) is \(\omega_j^\top \in \mathbb{R}^d\)), drawn i.i.d. from the spectral density of the chosen kernel:
Gaussian: \(\omega_{jk} \sim \mathcal{N}(0, 2\gamma)\) (e.g., \(\gamma=1/\ell^2\)).
Laplace: \(\omega_{jk} \sim \mathrm{Cauchy}(0, 1/\sigma)\) i.i.d.
b Random phase vector \(b \in \mathbb{R}^m\), i.i.d. \(\mathrm{Unif}[0,\,2\pi]\).
Design matrix \(X \in \mathbb{R}^{n\times d}\).
Response variable \(y \in \mathbb{R}^{n}\).
Kernel type either "gaussian"or "laplace".
Method for constructing or approximating :
"exact"Construct the full kernel matrix \(K \in \mathbb{R}^{n\times n}\) using design martix \(X\).
"nystrom"Construct a low-rank approximation of the kernel matrix \(K \in \mathbb{R}^{n \times n}\) using the Nyström approximation.
"pivoted"Construct a low-rank approximation of the kernel matrix \(K \in \mathbb{R}^{n \times n}\) using Pivoted Cholesky decomposition.
"rff"Use Random Fourier Features to construct a feature map \(Z \in \mathbb{R}^{n \times m}\) (with \(m\) random features) so that \(K \approx Z Z^\top\). Here, \(m\) is the number of features.
Approximation rank(number of random features) used for the low-rank kernel approximation. If not provided by the user, it defaults to $$\lceil n \cdot \frac{\log(d + 5)}{10} \rceil,$$ where \(n = nrow(X)\) and \(d = ncol(X)\).
Tolerance parameter used only in "pivoted"
for stopping criterion of the Pivoted Cholesky decomposition.
Scaling parameter of the kernel(\(\rho\)), specified by the user.
Defaults to 1.
$$\text{Gaussian kernel : } \mathcal{K}(x, x') = \exp(-\rho \| x - x'\|^2_2)$$
$$\text{Laplace kernel : } \mathcal{K}(x, x') = \exp(-\rho \| x - x'\|_1)$$
Regularization parameter. If NULL, the penalty parameter
is chosen automatically via CVST package. If not provided, the argument is set to a
kernel-specific grid of 100 values: \([10^{-10}, 10^{-3}]\) for Gaussian, \([10^{-5}, 10^{-2}]\) for Laplace.
If TRUE, accelerated cross-validation is
performed via sequential testing (early stopping) as implemented in the CVST package.
The default is FALSE.
Number of parallel threads.
The default is 4. If the system does not support 4 threads,
it automatically falls back to 1 thread.
Parallelization (implemented in C++) is one of the main advantages
of this package and is applied only for opt = "nystrom" or opt = "rff", and for the
Laplace kernel (kernel = "laplace").
If TRUE, detailed progress and cross-validation results are printed to the console. If FALSE, suppresses intermediate output and only returns the final result.
The function performs several input checks and automatic adjustments:
If x is a vector, it is converted to a one column matrix.
Otherwise, x must be a matrix; otherwise an error is thrown.
y must be a vector, and its length must match nrow(x).
kernel must be either gaussian or laplace.
opt must be one of "exact", "pivoted",
"nystrom", or "rff".
If m is NULL, it defaults to
$$\lceil n \cdot \log(d + 5) / 10 \rceil$$
where \(n = nrow(X)\) and \(d = ncol(X)\).
Otherwise, m must be a positive integer.
rho must be a positive real number (default is 1).
lambda can be specified in three ways:
A positive numeric scalar, in which case the model is fitted with this single value.
A numeric vector (length >= 3) of positive values used as a tuning grid;
selection is performed by CVST cross-validation (sequential testing if
fastcv = TRUE).
NULL: use a default grid (internal setting) and tune lambda
via CVST cross-validation (sequential testing if fastcv = TRUE).
n_threads: Number of threads for parallel computation.
Default is 4. If the system has <= 3 available processors,
it uses 1.
# Data setting
set.seed(1)
lambda = 1e-4
d = 1
rho = 1
n = 50
X = matrix(runif(n*d, 0, 1), nrow = n, ncol = d)
y = as.vector(sin(2*pi*rowMeans(X)^3) + rnorm(n, 0, 0.1))
# Exapmle: pivoted cholesky
model = fastkrr(X, y, kernel = "gaussian", opt = "pivoted", rho = rho, lambda = 1e-4)
# Example: nystrom
model = fastkrr(X, y, kernel = "gaussian", opt = "nystrom", rho = rho, lambda = 1e-4)
# Example: random fourier features
model = fastkrr(X, y, kernel = "gaussian", opt = "rff", rho = rho, lambda = 1e-4)
# Example: Laplace kernel
model = fastkrr(X, y, kernel = "laplace", opt = "nystrom", n_threads = 1, rho = rho)
Run the code above in your browser using DataLab