KODAMA.matrix: Knowledge Discovery by Accuracy Maximization

Description

Run KODAMA on a numeric data matrix and return the optimized label runs and nearest-neighbor structure used by KODAMA.visualization.

Usage

KODAMA.matrix(
  data,
  spatial = NULL,
  samples = NULL,
  M = 100,
  Tcycle = 20,
  ncomp = min(c(50, ncol(data))),
  W = NULL,
  metrics = "euclidean",
  constrain = NULL,
  fix = NULL,
  landmarks = 10000,
  splitting = ifelse(nrow(data) < 40000, 100, 300),
  spatial.resolution = 0.3,
  n.cores = 1,
  ancestry = FALSE,
  seed = 1234,
  ...
)

Value

A list with:

acc: Numeric vector of length M with final run accuracies.
v: Numeric matrix (M x Tcycle) of accuracy trajectories.
res: Numeric matrix (M x nrow(data)) with optimized labels from each run.
knn_Rnanoflann: List containing indices, distances, and neighbors.
data: Input data matrix.
res_constrain: Numeric matrix (M x nrow(data)) with effective constraints used in each run.
n.cores: Number of cores used by KODAMA.matrix. This value is reused by KODAMA.visualization when visualization config sets define.n.cores = FALSE.

Arguments

data: Numeric matrix where rows are samples and columns are variables.
spatial: Optional numeric matrix of spatial coordinates with nrow(spatial) == nrow(data).
samples: Optional sample identifier vector used to separate multiple spatial samples on a shared coordinate axis.
M: Number of independent KODAMA optimization runs.
Tcycle: Number of optimization cycles for each run.
ncomp: Number of PLS components.
W: Optional starting labels for semi-supervised initialization.
metrics: Distance metric passed to Rnanoflann::nn.
constrain: Optional grouping constraint vector; entries with the same value are forced to share labels within each run.
fix: Optional logical vector indicating which entries in W are fixed during optimization.
landmarks: Number of landmark clusters used in each run.
splitting: Number of clusters used for initialization when W is NULL.
spatial.resolution: Fraction of landmarks used to define spatial constraint clusters.
n.cores: Number of worker processes. On Unix-like systems forked workers are used; on Windows PSOCK workers are used.
ancestry: Logical; if TRUE, ancestry-aware spatial processing is used.
seed: Random seed.
...: Ignored legacy arguments. Passing FUN is deprecated and has no effect.

Author

Stefano Cacciatore and Leonardo Tenori

Details

The function runs M independent KODAMA optimizations and builds a KODAMA-weighted nearest-neighbor structure. Progress bars are shown for both the optimization stage and dissimilarity update stage.

The PLS backend is selected automatically inside corecpp before each cross-validation step from the current number of classes: "plssvd" (fast mode) when ncomp is smaller than the number of classes, otherwise "simpls".

When n.cores > 1, Unix-like systems use fork-based parallelism, which typically reduces memory duplication through copy-on-write when worker code treats data as read-only. On Windows, socket workers are used and the input matrix is copied to workers by design.

Examples

Run this code

# \donttest{
data(iris)
data_mat <- iris[, -5]
kk <- KODAMA.matrix(data_mat, ncomp = 2, M = 10, n.cores = 1)
embedding <- KODAMA.visualization(kk, "t-SNE")
plot(embedding, col = as.numeric(iris[, 5]), cex = 2)
# }

Run the code above in your browser using DataLab