COMP_initialization: COMP dictionary initialization

Description

Dictionary initialization using the Compressive Orthogonal Matching Pursuit (COMP) method

Usage

COMP_initialization(
  K,
  Data,
  SK_Data = NULL,
  Frequencies = NULL,
  lower = -Inf,
  upper = Inf,
  maxIter = 1500,
  HardThreshold = FALSE,
  print_level = 0,
  ncores = 1,
  m = nrow(Frequencies),
  ...
)

Arguments

is a dictionary size.

Data

is a Filebacked Big Matrix \(s \times N\) with data vectors stored in the matrix columns.

SK_Data

is a data sketch. It is a \(2m\)-dimensional complex vector. The first \(m\) coordinates correspond to the real parts and the last \(m\) coordinates to the imaginary parts. If it is NULL, the sketch is computed using Sketch function of chickn package.

Frequencies

is a frequency matrix \(m \times s\) with frequency vectors in the matrix rows. If NULL, the frequencies are generated using GenerateFrequencies function of chickn package.

lower

is a lower boundary. It is an \(s\)-dimensional vector.

upper

is an upper boundary. It is an \(s\)-dimensional vector.

maxIter

is a maximum number of iterations in the computation of new dictionary element. The default value is 1500.

HardThreshold

indicates whether to execute the hard thresholding step. The default is FALSE.

print_level

controls how much output is shown during the optimization process. Possible values:

0 no output (default value)
1 show iteration number and value of objective function
2 1 + show values of weights

ncores

is a number of cores. The default value is 1.

is a number of the frequency vectors.

...

are additional parameters passed to GenerateFrequencies function.

Value

a list

D is the obtained dictionary,
weights is the resulting weights,
ObjF is the objective function values computed at each iteration.
Sketch is the data sketch
Frequencies is the frequency matrix

Details

The initialization routine is based on the Compressive Orthogonal Matching Pursuit (COMP) algorithm. COMP is an iterative greedy method that builds a dictionary operating on a compressed data version (a.k.a. data sketch). It alternates between expanding the dictionary \(D\) with a new element \(d_i\), whose sketch \(SK(d_i)\) is the most correlated to the residue, and calculating the weights of the dictionary elements \(w_1, \dots, w_K\) by minimizing the difference between the data sketch \(SK(Data)\) and a linear combination of dictionary sketches, i.e. \(\|SK(Data) - \sum_{i=1}^K w_i \cdot SK(d_i)\|\). Unlike COMP, the implemented dictionary initialization routine does not perform an additional global optimization with respects to both variables: weights and dictionary elements.

Examples

Run this code

# NOT RUN {
X = matrix(abs(rnorm(n = 1000)), ncol = 100, nrow = 10)
lb = apply(X, 1, min)
ub = apply(X, 1, max)
X_fbm = bigstatsr::FBM(init = X, ncol = ncol(X), nrow = nrow(X))
m = 64
W = chickn::GenerateFrequencies(Data = X_fbm, m = m, N0 = ncol(X_fbm),
                                ncores = 1, niter= 3, nblocks = 2, sigma_start = 0.001)$W
SK= chickn::Sketch(X_fbm, W)
D0 = COMP_initialization(K = 10, Data = X_fbm, SK_Data = SK, Frequencies = W,
                        lower = lb, upper = ub)$Dictionary
# }

Run the code above in your browser using DataLab