Detect spatially variable genes using the binSpect approach from Giotto. This method binarizes gene expression and tests for spatial enrichment of high-expressing cells using Fisher's exact test.
Identifies spatially variable genes by: 1. Binarizing gene expression (high/low) 2. Building a spatial neighborhood network 3. Testing whether high-expressing cells tend to be neighbors of other high-expressing cells more than expected by chance
CalSVG_binSpect(
expr_matrix,
spatial_coords,
bin_method = c("kmeans", "rank"),
rank_percent = 30,
network_method = c("delaunay", "knn"),
k = 10L,
do_fisher_test = TRUE,
adjust_method = "fdr",
n_threads = 1L,
verbose = TRUE
)A data.frame with SVG detection results, sorted by significance/score. Columns:
gene: Gene identifier
estimate: Odds ratio from 2x2 contingency table.
OR > 1 indicates spatial clustering of high-expressing cells.
p.value: P-value from Fisher's exact test (if requested)
p.adj: Adjusted p-value
score: Combined score = -log10(p.value) * estimate
high_expr_count: Number of high-expressing cells
Numeric matrix of gene expression values.
Rows: genes
Columns: spatial locations (spots/cells)
Values: normalized expression (e.g., log counts or normalized counts)
Numeric matrix of spatial coordinates.
Rows: spatial locations (must match columns of expr_matrix)
Columns: x, y (and optionally z) coordinates
Character string specifying binarization method.
"kmeans" (default): K-means clustering with k=2.
Automatically separates high and low expression groups.
Robust to different expression distributions.
"rank": Top percentage by expression rank.
More consistent across genes with different distributions.
Controlled by rank_percent parameter.
Numeric (0-100). For bin_method = "rank",
the percentage of cells to classify as "high expressing".
Default is 30 (top 30
Lower values (10-20
Higher values (40-50
Character string specifying spatial network construction.
"delaunay" (default): Delaunay triangulation
"knn": K-nearest neighbors
Integer. Number of neighbors for KNN network. Default is 10.
Logical. Whether to perform Fisher's exact test. Default is TRUE.
TRUE: Returns p-values from Fisher's exact test
FALSE: Returns only odds ratios (faster)
Character string for p-value adjustment.
Default is "fdr" (Benjamini-Hochberg). See p.adjust() for options.
Integer. Number of parallel threads. Default is 1.
Logical. Print progress messages. Default is TRUE.
Method Overview:
binSpect constructs a 2x2 contingency table for each gene based on:
Cell A expression: High (1) or Low (0)
Cell B expression: High (1) or Low (0)
For all pairs of neighboring cells (edges in the spatial network):
| Cell B Low | Cell B High | |
| Cell A Low | n_00 | n_01 |
| Cell A High | n_10 | n_11 |
Statistical Test: Fisher's exact test is used to test whether n_11 (both neighbors high) is greater than expected under independence.
Odds Ratio Interpretation:
OR = 1: No spatial pattern
OR > 1: High-expressing cells cluster together (positive spatial pattern)
OR < 1: High-expressing cells avoid each other (negative pattern)
Advantages:
Fast computation (no covariance matrix inversion)
Robust to outliers through binarization
Interpretable odds ratio statistic
Considerations:
Binarization threshold affects results
K-means may produce unstable results for bimodal distributions
Rank method more stable but arbitrary threshold
Dries, R. et al. (2021) Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biology.
CalSVG, binarize_expression,
buildSpatialNetwork
# Load example data
data(example_svg_data)
expr <- example_svg_data$logcounts[1:20, ]
coords <- example_svg_data$spatial_coords
# \donttest{
# Basic usage (requires RANN package)
if (requireNamespace("RANN", quietly = TRUE)) {
results <- CalSVG_binSpect(expr, coords,
network_method = "knn", k = 10,
verbose = FALSE)
head(results)
}
# }
Run the code above in your browser using DataLab