Spectrum (version 0.2)

Spectrum: Spectrum: Versatile ultra-fast spectral clustering for single and multi-view data

Description

Spectrum is a fast adaptive spectral clustering method for single or multi-view data. Spectrum uses a new type of adaptive density aware kernel that strengthens local connections in the graph. For integrating multi-view data and reducing noise a tensor product graph data integration and diffusion procedure is used. Spectrum contains two approaches for finding the number of clusters (K); the classical eigengap method and a novel multimodality gap method. The multimodality gap analyses the distribution of the eigenvectors of the graph Laplacian to decide K and can be used to tune the kernel.

Usage

Spectrum(data, method = 1, silent = FALSE, showres = TRUE,
  diffusion = TRUE, kerneltype = c("density", "stsc"), maxk = 10,
  NN = 3, NN2 = 7, showpca = FALSE, showheatmap = FALSE,
  showdimred = FALSE, visualisation = c("umap", "tsne"), frac = 2,
  thresh = 7, fontsize = 18)

Arguments

data

Data frame or list of data frames: contains the data with samples as columns and rows as features. For multi-view data a list of dataframes is to be supplied with the samples in the same order.

method

Numerical value: 1 = default eigengap method (Gaussian clusters), 2 = multimodality gap method (Gaussian/ non-linear clusters)

silent

Logical flag: whether to turn off messages

showres

Logical flag: whether to show the results on the screen

diffusion

Logical flag: whether to perform graph diffusion to reduce noise and boost performance, usually recommended

kerneltype

Character string: 'density' (default) = adaptive density aware kernel, 'stsc' = Zelnik-Manor self-tuning kernel

maxk

Numerical value: the maximum number of expected clusters (default = 10). This is data dependent - do not set excessively high.

NN

Numerical value: kernel param, the number of nearest neighbours to use sigma parameters (default = 3)

NN2

Numerical value: kernel param, the number of nearest neighbours to use for the common nearest neigbours (default = 7)

showpca

Logical flag: whether to show pca when running on one view

showheatmap

Logical flag: whether to show heatmap of affinity matrix when running on one view

showdimred

Logical flag: whether to show UMAP or t-SNE of final affinity matrix

visualisation

Character string: what kind of dimensionality reduction to run on the affinity matrix (umap or tsne)

frac

Numerical value: optk search param, fraction to find the last substantial drop (multimodality gap method param)

thresh

Numerical value: optk search param, how many points ahead to keep searching (multimodality gap method param)

fontsize

Numerical value: controls font size of the ggplot2 plots

Value

A list, containing: 1) cluster assignments, in the same order as input data columns 2) eigenvector analysis results (either eigenvalues or dip test statistics) 3) optimal K 4) final affinity matrix 5) eigenvectors and eigenvalues of graph Laplacian

Examples

Run this code
# NOT RUN {
res <- Spectrum(brain[[1]][,1:50])
# }

Run the code above in your browser using DataLab