Learn R Programming

kerntools (version 1.2.0)

cLinear: Compositional kernels

Description

`cLinear()` is the compositional-linear kernel, which is useful for compositional data (relative frequencies or proportions). `Aitchison()` is akin to the RBF kernel for this type of data. Thus, the expected input for both kernels is a matrix or data.frame containing strictly non-negative or (even better) positive numbers. This input has dimension NxD, with N>1 samples and D>1 compositional features.

Usage

cLinear(X, cos.norm = FALSE, feat_space = FALSE, zeros = "none")

Aitchison(X, g = NULL, zeros = "none")

Value

Kernel matrix (dimension: NxN).

Arguments

X

Matrix or data.frame that contains the compositional data.

cos.norm

Should the resulting kernel matrix be cosine normalized? (Defaults: FALSE).

feat_space

If FALSE, only the kernel matrix is returned. Otherwise, the feature space is also returned. (Defaults: FALSE).

zeros

"none" to warrant that there are no zeroes in X, "pseudo" to replace zeroes by a pseudocount. (Defaults="none").

g

Gamma hyperparameter. If g=0 or NULL, the matrix of squared Aitchison distances is returned instead of the Aitchison kernel matrix. (Defaults=NULL).

Details

In compositional data, samples (rows) sum to an arbitrary or irrelevant number. This is most clear when working with relative frequencies, as all samples add to 1 (or 100, or other uninformative value). Zeroes are a typical challenge when using compositional approaches. They introduce ambiguity because they can have multiple causes; a zero may signal a true absence, or a value so small that it is below the detection threshold of an instrument. A simple approach to deal with zeroes is replacing them by a pseudocount. More sophisticated approaches are reviewed elsewhere; see for instance the R package `zCompositions`.

References

Ramon, E., Belanche-Muñoz, L. et al (2021). kernInt: A kernel framework for integrating supervised and unsupervised analyses in spatio-temporal metagenomic datasets. Frontiers in microbiology 12 (2021): 609048. doi: 10.3389/fmicb.2021.609048

Examples

Run this code
data <- soil$abund

## This data is sparse and contains a lot of zeroes. We can replace them by pseudocounts:
Kclin <- cLinear(data,zeros="pseudo")
Kclin[1:5,1:5]

## With the feature space:
Kclin <- cLinear(data,zeros="pseudo",feat_space=TRUE)

## With cosine normalization:
Kcos <- cLinear(data,zeros="pseudo",cos.norm=TRUE)
Kcos[1:5,1:5]

## Aitchison kernel:
Kait <- Aitchison(data,g=0.0001,zeros="pseudo")
Kait[1:5,1:5]

Run the code above in your browser using DataLab