make.adjacency: A General Framework for Adjacency Matrix Construction

Description

Builds an $n \times n$ adjacency matrix from data using Euclidean or variable-weighted distances, with optional locally-adaptive scalings and optional variable weighting by shared-nearest-neighbours (SNN), similarity ranks (SIM), or both. The options reproduce a family of adjacency matrices including the variants described by Ghashti, Hare and Thompson (2025).

Usage

make.adjacency(data,
                method = "vw",
                isLocWeighted = FALSE,
                isModWeighted = FALSE,
                isSparse = FALSE,
                ModMethod = NULL,
                scale = FALSE,
                sig = 1,
                radius = NULL,
                cv.method = "cv.ls")

Value

An $n \times n$ numeric adjacency matrix $W$ with ones on the diagonal.

Arguments

data: Numeric matrix or data frame of size $n \times p$. Required.
method: Distance construction "eu" for Euclidean; "vw" for variable-weighted scaling (see Details). Defaults to "v".
isLocWeighted: Logical. If TRUE, use locally-adaptive scalings $\sigma_i$ (Zelnik–Perona) to self-tune the kernel; otherwise use a global scale sig. Defaults to FALSE.
isModWeighted: Logical. If TRUE, apply a weighting matrix $M$ based on SNN and/or SIM (see Details). Defaults to FALSE
isSparse: Logical. If TRUE and isModWeighted = TRUE, then SNN (ModMethod = "snn") and/or SIM (ModMethod = "sim") arguments creates a sparse adjacency matrix (see Details). Defaults to FALSE.
ModMethod: One of "snn", "sim", or "both" when isModWeighted=TRUE.
scale: Logical; standardize columns of data before distance construction.
sig: Positive numeric value for global kernel width, used only when isLocWeighted=FALSE.
radius: Integer $r$. If NULL, $r$ is estimated with find.radius on the constructed distance matrix.
cv.method: Bandwidth selector for method="vw" passed to np::npudensbw; one of "cv.ml" or "cv.ls".

Mapping to named adjacency matrices

Relating to the paper by Ghashti, Hare, and Thompson (2025), let "vw" denote the variable-weighted distance method="vw" and "eu" for the traditional squared Euclidean distance; "la" denotes locally-adaptive scaling when isLocWeighted=TRUE (Zelnik-Manor and Perona, 2004); "id" denotes identity for isModWeighted = FALSE, and "sim", "snn" and "simsnn" denote weightings for $M$ described above.

To reproduce results from the 2025 paper, below are a few examples of adjacency construction:

vw-id: $\exp(-D^2/\sigma^2)$ with method="vw", isLocWeighted=FALSE, isModWeighted=FALSE.
vwla-id: $\exp(-D^2/(\sigma_i\sigma_j))$ with method="vw", isLocWeighted=TRUE, isModWeighted=FALSE.
vw-sim: $0.5\,\exp(-D^2/\sigma^2)\,(1+\text{SIM})$ with method="vw", isLocWeighted=FALSE, isModWeighted=TRUE, ModMethod="sim", isSparse=FALSE.
vw-snns: $0.5\,\exp(-D^2/\sigma^2)\,(1+\text{SNN})$ with method="vw", isLocWeighted=FALSE, isModWeighted=TRUE, ModMethod="snn", isSparse=TRUE.
vw-simsnns: $0.25\,\exp(-D^2/\sigma^2)\,(1+\text{SIM})(1+\text{SNN})$ with method="vw", isLocWeighted=FALSE, isModWeighted=TRUE, ModMethod="both", isSparse=TRUE.

Also note that:

If radius is NULL, $r$ is chosen adaptively via find.radius on the constructed distance matrix.
method="vw" requires npudensbw for variable weighted bandwidths, with default np::npudensbw(data, bwmethod = cv.method, nmulti = 3) (Hayfield and Racine, 2008).

Notes

When r is determined by find.radius, we implement a modified version of the Natural Neighbors algorithm from Zhu et al. (2016).
SNN is a modified version of the Shared Nearest Neighbors algorithm from Jarvis and Patrick (1973).
More information on locally-adaptive scalings are seen in Zelnik-Manor and Perona (2004).

Details

Step 1: Distance.

method="eu": $D_{ij}=\|x_i-x_j\|_2$.
method="vw": compute product-kernel bandwidths h via np::npudensbw, set feature-weights $w_j=1/h_j^2$, rescale data as $\tilde{x}_{ij}=\sqrt{w_j}\,x_{ij}$, then $D_{ij}=\|\tilde{x}_i-\tilde{x}_j\|_2$. (Variable-weighted metric.)

Step 2:Similarity kernel.

Locally-adaptive scaling from Zelnik-Manor: if isLocWeighted=TRUE, compute $\sigma_i$ as the distance to the $r$-th neighbour with compute.sigma and set $$S_{ij}=\exp\!\Big(-D_{ij}^2 / (\sigma_i \sigma_j)\Big),\quad S_{ii}=1.$$
Global scale: if isLocWeighted=FALSE, $$S_{ij}=\exp\!\Big(-D_{ij}^2 / \sigma^2\Big),\quad S_{ii}=1.$$

Step 3: Weighting Matrix (optional).

Let $\text{SNN}_{ij}$ be the shared-$r$-NN overlap fraction (see compute.SNN) for observations $i$ and $j$, and $\rho_i$ be the $(r+1)$-largest entry of observation $i$ in matrix $S$. Define $\text{SIM}_{ij}=\sqrt{\rho_i \rho_j}$. For isModWeighted=TRUE we have the following options

ModMethod="snn": $M_{ij} = \begin{cases} 0.5\,(1+\text{SNN}_{ij}), & \text{if } \texttt{isSparse=FALSE},\\ \text{SNN}_{ij}, & \text{if } \texttt{isSparse=TRUE}. \end{cases}$
ModMethod="sim": $M_{ij} = \begin{cases} 0.5\,(1+\text{SIM}_{ij}), & \text{if } \texttt{isSparse=FALSE},\\ \text{SIM}_{ij}, & \text{if } \texttt{isSparse=TRUE}. \end{cases}$
ModMethod="both": $M_{ij} = \begin{cases} 0.25\,(1+\text{SIM}_{ij})\,(1+\text{SNN}_{ij}), & \text{if } \texttt{isSparse=FALSE},\\ \text{SIM}_{ij}\cdot \text{SNN}_{ij}, & \text{if } \texttt{isSparse=TRUE}. \end{cases}$

The returned adjacency is $W = S \circ M$ when isModWeighted = TRUE, otherwise $W=S$. These choices align with the table of named adjacency matrices (see Mapping below).

References

Ghashti, J. S., Hare, W., and J. R. J. Thompson (2025). Variable-weighted adjacency constructions for fuzzy spectral clustering. Submitted.

Hayfield, T., and J. S. Racine (2008). Nonparametric Econometrics: The np Package. Journal of Statistical Software 27(5).

Jarvis, R. A., and A. E. Patrick (1973). Clustering using a similarity measure based on shared near neighbors. IEEE Transactions on Computers, 22(11), 1025-1034.

Zelnik-Manor, L., and P. Perona (2004). Self-tuning spectral clustering. Advances in Neural Information Processing Systems, 17.

Zhu, Q., Feng, J., and J. Huang (2016). Natural neighbor: A self-adaptive neighborhood method without parameter K. Pattern Recognition Letters, 80, 30-36.

Examples

Run this code

set.seed(1)
X <- scale(matrix(rnorm(200), 100, 2))


W1 <- make.adjacency(X,
                    method = "eu",
                    isLocWeighted = TRUE) # "eula-id" named adjacency

W2 <- make.adjacency(X,
                    method = "vw",
                    isLocWeighted = TRUE) # "vwla-id" named adjacency

# compare W(xi,xj) i,j = 1,...,5 for eu/vw pair W1 and W2
W1[1:5,1:5]
W2[1:5,1:5]

W3 <- make.adjacency(X,
                    method = "eu",
                    isLocWeighted = TRUE,
                    isModWeighted = TRUE,
                    ModMethod = "snn",
                    isSparse = FALSE) # "eula-snn" named adjacency

W4 <- make.adjacency(X,
                    method = "vw",
                    isLocWeighted = TRUE,
                    isModWeighted = TRUE,
                    ModMethod = "snn",
                    isSparse = FALSE) # "vwla-snn" named adjacency

# compare W(xi,xj) i,j = 1,...,5 for eu/vw pair W3 and W4
W3[1:5,1:5]
W4[1:5,1:5]

Run the code above in your browser using DataLab