Nystrom_kernel: Nystrom kernel approximation

Description

An implementation of the Nystrom kernel approximation method.

Usage

Nystrom_kernel(
  Data,
  c,
  l,
  s,
  gamma = NULL,
  max_neighbors = 32,
  DIR_output = tempfile(),
  DIR_save = tempfile(),
  ncores = 2,
  ncores_svd = 1,
  distance_type = "W1",
  kernel_type = "Gaussian",
  verbose = FALSE
)

Arguments

Data

A Filebacked Big Matrix n x N. Data vectors are stored in the matrix columns.

Number of columns selected for the approximation.

An intermediate rank l < c.

A target rank s < l.

gamma

Kernel parameter. If it is NULL (default), the parameter is estimated using gamma_estimation.

max_neighbors

Number of neigbors selected for the paramenter estimation.

DIR_output

A directory for intermediate computations.

DIR_save

A directory to save the result.

ncores

Number of cores. Default is 2.

ncores_svd

Number of cores used for the SVD computaion. It is recommended to use 1 core (default).

distance_type

Distance function type. The available types are Wasserstein-1 ('W1') and Euclidean ('Euclide'). The default value is 'W1'.

kernel_type

Kernel function type c('Gaussian', 'Laplacian').

verbose

logical that indicates whether dysplay the processing steps.

Value

A list with the following attributes:

K_W1 is the Filebacked Big Matrix of the Nystrom kernel approximation.
gamma is the estimated kernel parameter.
RandomSample is the data vector indices, selected for the Nystrom approximation.

Details

Nystrom method consists in approximating the kernel matrix \(K\) by \( C W^{-1} C^{\top}\), with \(C \in R^{N \times c}\) obtained from \(K\) by randomly selecting only c columns and \(W \in R^{c \times c}\) obtained from \(C\) by selecting as well c corresponding rows. The kernel function, based on the distance metric, is given as follows: \(k(x_i,x_j) = e^{- gamma \cdot d^p(x_i,x_j)}\), where \(p\) is equal to 1 for 'Laplacian' kernel and equal to 2 for 'Gaussian' kernel and where \(d(x_i,x_j)\) is the distance between data vectors \(x_i\) and \(x_j\).

Examples

Run this code

# NOT RUN {
X = matrix(rnorm(2000), ncol=100, nrow = 20)
X_FBM = bigstatsr::FBM(init = X, ncol=100, nrow = 20)

output = Nystrom_kernel(Data = X_FBM, c = 10, l = 7, s = 5, 
                        max_neighbors = 3, ncores = 2)
                        
# }

Run the code above in your browser using DataLab