Learn R Programming

KODAMA (version 3.3)

KODAMA.matrix: Knowledge Discovery by Accuracy Maximization

Description

Run KODAMA on a numeric data matrix and return the optimized label runs and nearest-neighbor structure used by KODAMA.visualization.

Usage

KODAMA.matrix(
  data,
  spatial = NULL,
  samples = NULL,
  M = 100,
  Tcycle = 20,
  ncomp = min(c(50, ncol(data))),
  W = NULL,
  metrics = "euclidean",
  constrain = NULL,
  fix = NULL,
  landmarks = 10000,
  splitting = ifelse(nrow(data) < 40000, 100, 300),
  spatial.resolution = 0.3,
  n.cores = 1,
  ancestry = FALSE,
  seed = 1234,
  ...
)

Value

A list with:

acc

Numeric vector of length M with final run accuracies.

v

Numeric matrix (M x Tcycle) of accuracy trajectories.

res

Numeric matrix (M x nrow(data)) with optimized labels from each run.

knn_Rnanoflann

List containing indices, distances, and neighbors.

data

Input data matrix.

res_constrain

Numeric matrix (M x nrow(data)) with effective constraints used in each run.

n.cores

Number of cores used by KODAMA.matrix. This value is reused by KODAMA.visualization when visualization config sets define.n.cores = FALSE.

Arguments

data

Numeric matrix where rows are samples and columns are variables.

spatial

Optional numeric matrix of spatial coordinates with nrow(spatial) == nrow(data).

samples

Optional sample identifier vector used to separate multiple spatial samples on a shared coordinate axis.

M

Number of independent KODAMA optimization runs.

Tcycle

Number of optimization cycles for each run.

ncomp

Number of PLS components.

W

Optional starting labels for semi-supervised initialization.

metrics

Distance metric passed to Rnanoflann::nn.

constrain

Optional grouping constraint vector; entries with the same value are forced to share labels within each run.

fix

Optional logical vector indicating which entries in W are fixed during optimization.

landmarks

Number of landmark clusters used in each run.

splitting

Number of clusters used for initialization when W is NULL.

spatial.resolution

Fraction of landmarks used to define spatial constraint clusters.

n.cores

Number of worker processes. On Unix-like systems forked workers are used; on Windows PSOCK workers are used.

ancestry

Logical; if TRUE, ancestry-aware spatial processing is used.

seed

Random seed.

...

Ignored legacy arguments. Passing FUN is deprecated and has no effect.

Author

Stefano Cacciatore and Leonardo Tenori

Details

The function runs M independent KODAMA optimizations and builds a KODAMA-weighted nearest-neighbor structure. Progress bars are shown for both the optimization stage and dissimilarity update stage.

The PLS backend is selected automatically inside corecpp before each cross-validation step from the current number of classes: "plssvd" (fast mode) when ncomp is smaller than the number of classes, otherwise "simpls".

When n.cores > 1, Unix-like systems use fork-based parallelism, which typically reduces memory duplication through copy-on-write when worker code treats data as read-only. On Windows, socket workers are used and the input matrix is copied to workers by design.

See Also

KODAMA.visualization

Examples

Run this code
# \donttest{
data(iris)
data_mat <- iris[, -5]
kk <- KODAMA.matrix(data_mat, ncomp = 2, M = 10, n.cores = 1)
embedding <- KODAMA.visualization(kk, "t-SNE")
plot(embedding, col = as.numeric(iris[, 5]), cex = 2)
# }

Run the code above in your browser using DataLab