Learn R Programming

chickn (version 1.2.3)

Nystrom_kernel: Nystrom kernel approximation

Description

An implementation of the Nystrom kernel approximation method.

Usage

Nystrom_kernel(
  Data,
  c,
  l,
  s,
  gamma = NULL,
  max_neighbors = 32,
  DIR_output = tempfile(),
  DIR_save = tempfile(),
  ncores = 2,
  ncores_svd = 1,
  distance_type = "W1",
  kernel_type = "Gaussian",
  verbose = FALSE
)

Arguments

Data

A Filebacked Big Matrix n x N. Data vectors are stored in the matrix columns.

c

Number of columns selected for the approximation.

l

An intermediate rank l < c.

s

A target rank s < l.

gamma

Kernel parameter. If it is NULL (default), the parameter is estimated using gamma_estimation.

max_neighbors

Number of neigbors selected for the paramenter estimation.

DIR_output

A directory for intermediate computations.

DIR_save

A directory to save the result.

ncores

Number of cores. Default is 2.

ncores_svd

Number of cores used for the SVD computaion. It is recommended to use 1 core (default).

distance_type

Distance function type. The available types are Wasserstein-1 ('W1') and Euclidean ('Euclide'). The default value is 'W1'.

kernel_type

Kernel function type c('Gaussian', 'Laplacian').

verbose

logical that indicates whether dysplay the processing steps.

Value

A list with the following attributes:

  • K_W1 is the Filebacked Big Matrix of the Nystrom kernel approximation.

  • gamma is the estimated kernel parameter.

  • RandomSample is the data vector indices, selected for the Nystrom approximation.

Details

Nystrom method consists in approximating the kernel matrix \(K\) by \( C W^{-1} C^{\top}\), with \(C \in R^{N \times c}\) obtained from \(K\) by randomly selecting only c columns and \(W \in R^{c \times c}\) obtained from \(C\) by selecting as well c corresponding rows. The kernel function, based on the distance metric, is given as follows: \(k(x_i,x_j) = e^{- gamma \cdot d^p(x_i,x_j)}\), where \(p\) is equal to 1 for 'Laplacian' kernel and equal to 2 for 'Gaussian' kernel and where \(d(x_i,x_j)\) is the distance between data vectors \(x_i\) and \(x_j\).

See Also

W1_parallel, gamma_estimation, big_randomSVD, cumsum_parallel.

Examples

Run this code
# NOT RUN {
X = matrix(rnorm(2000), ncol=100, nrow = 20)
X_FBM = bigstatsr::FBM(init = X, ncol=100, nrow = 20)

output = Nystrom_kernel(Data = X_FBM, c = 10, l = 7, s = 5, 
                        max_neighbors = 3, ncores = 2)
                        
# }

Run the code above in your browser using DataLab