Detect spatially variable genes using the MERINGUE approach based on Moran's I spatial autocorrelation statistic.
Identifies spatially variable genes by computing Moran's I spatial autocorrelation statistic for each gene. Genes with significant positive spatial autocorrelation (similar expression values clustering together) are identified as SVGs.
CalSVG_MERINGUE(
expr_matrix,
spatial_coords,
network_method = c("delaunay", "knn"),
k = 10L,
filter_dist = NA,
alternative = c("greater", "less", "two.sided"),
adjust_method = "BH",
min_pct_cells = 0.05,
n_threads = 1L,
use_cpp = TRUE,
verbose = TRUE
)A data.frame with SVG detection results, sorted by significance. Columns:
gene: Gene identifier
observed: Observed Moran's I statistic. Range: [-1, 1].
Positive values indicate clustering, negative indicate dispersion.
expected: Expected Moran's I under null (approximately -1/(n-1))
sd: Standard deviation under null hypothesis
z_score: Standardized test statistic (observed - expected) / sd
p.value: Raw p-value from normal approximation
p.adj: Adjusted p-value (multiple testing corrected)
Numeric matrix of gene expression values.
Rows: genes
Columns: spatial locations (spots/cells)
Values: normalized expression (e.g., log-transformed counts)
Row names should be gene identifiers; column names should match
row names of spatial_coords.
Numeric matrix of spatial coordinates.
Rows: spatial locations (must match columns of expr_matrix)
Columns: coordinate dimensions (x, y, and optionally z)
Character string specifying how to construct the spatial neighborhood network.
"delaunay" (default): Delaunay triangulation. Creates natural
neighbors based on geometric triangulation. Good for relatively uniform
spatial distributions.
"knn": K-nearest neighbors. Each spot connected to its k
nearest neighbors. More robust for irregular distributions.
Integer. Number of neighbors for KNN method. Default is 10.
Ignored when network_method = "delaunay".
Smaller k (e.g., 5-6): More local patterns, faster computation
Larger k (e.g., 15-20): Broader patterns, smoother results
Numeric or NA. Maximum Euclidean distance for neighbors. Pairs with distance > filter_dist are not considered neighbors. Default is NA (no filtering). Useful for:
Removing long-range spurious connections
Focusing on local spatial patterns
Character string specifying the alternative hypothesis for the Moran's I test.
"greater" (default): Test for positive autocorrelation
(clustering of similar values). Most appropriate for SVG detection.
"less": Test for negative autocorrelation (dissimilar
values as neighbors).
"two.sided": Test for any autocorrelation.
Character string specifying p-value adjustment method
for multiple testing correction. Passed to p.adjust().
Options include: "BH" (default, Benjamini-Hochberg), "bonferroni",
"holm", "hochberg", "hommel", "BY", "fdr", "none".
Numeric (0-1). Minimum fraction of cells that must contribute to the spatial pattern for a gene to be retained as SVG. Default is 0.05 (5 to filter genes driven by only a few outlier cells. Set to 0 to disable this filter.
Integer. Number of threads for parallel computation. Default is 1.
For large datasets: Set to number of available cores
Uses R's parallel::mclapply (not available on Windows)
Logical. Whether to use C++ implementation for faster computation. Default is TRUE. Falls back to R if C++ fails.
Logical. Whether to print progress messages. Default is TRUE.
Method Overview:
MERINGUE uses Moran's I, a classic measure of spatial autocorrelation: $$I = \frac{n}{W} \frac{\sum_i \sum_j w_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\sum_i (x_i - \bar{x})^2}$$
where:
n = number of spatial locations
W = sum of all spatial weights
w_ij = spatial weight between locations i and j
x_i = expression value at location i
Interpretation:
I > 0: Positive autocorrelation (similar values cluster)
I = 0: Random spatial distribution
I < 0: Negative autocorrelation (checkerboard pattern)
Statistical Testing: P-values are computed using normal approximation based on analytical formulas for the expected value and variance of Moran's I under the null hypothesis of complete spatial randomness.
Computational Considerations:
Time complexity: O(n^2) for network construction, O(n*m) for testing (n = spots, m = genes)
Memory: O(n^2) for storing spatial weights matrix
For n > 10,000 spots, consider using KNN with small k
Miller, B.F. et al. (2021) Characterizing spatial gene expression heterogeneity in spatially resolved single-cell transcriptomic data with nonuniform cellular densities. Genome Research.
Moran, P.A.P. (1950) Notes on Continuous Stochastic Phenomena. Biometrika.
Cliff, A.D. and Ord, J.K. (1981) Spatial Processes: Models & Applications. Pion.
CalSVG for unified interface,
buildSpatialNetwork for network construction,
moranI_test for individual gene testing
# Load example data
data(example_svg_data)
expr <- example_svg_data$logcounts[1:20, ] # Use subset for speed
coords <- example_svg_data$spatial_coords
# \donttest{
# Basic usage (requires RANN package for KNN)
if (requireNamespace("RANN", quietly = TRUE)) {
results <- CalSVG_MERINGUE(expr, coords,
network_method = "knn", k = 10,
verbose = FALSE)
head(results)
# Get significant SVGs
sig_genes <- results$gene[results$p.adj < 0.05]
}
# }
Run the code above in your browser using DataLab