Detect spatially variable genes using SPARK-X, a non-parametric method that tests for spatial expression patterns using multiple kernels.
SPARK-X is a scalable non-parametric method for identifying spatially variable genes. It uses variance component score tests with multiple spatial kernels (projection, Gaussian, and cosine) to detect various types of spatial expression patterns.
CalSVG_SPARKX(
expr_matrix,
spatial_coords,
kernel_option = c("mixture", "single"),
adjust_method = "BY",
n_threads = 1L,
verbose = TRUE
)A data.frame with SVG detection results. Columns:
gene: Gene identifier
p.value: Combined p-value across all kernels (ACAT method)
p.adj: Multiple testing adjusted p-value
If kernel_option = "mixture", additional columns for
individual kernel statistics and p-values (stat_*, pval_*)
Numeric matrix of gene expression values.
Rows: genes
Columns: spatial locations (spots/cells)
Values: raw counts or normalized counts (NOT log-transformed)
Note: SPARK-X works best with count data, not log-transformed data.
Numeric matrix of spatial coordinates.
Rows: spatial locations (must match columns of expr_matrix)
Columns: x, y coordinates
Character string specifying which kernels to use.
"mixture" (default): Test with all 11 kernels:
1 projection + 5 Gaussian + 5 cosine. Most comprehensive but slower.
Recommended for detecting diverse spatial patterns.
"single": Test with projection kernel only. Faster but
may miss some pattern types.
Character string for p-value adjustment. Default is "BY" (Benjamini-Yekutieli), which is more conservative and appropriate when tests may be correlated. Other options: "BH", "bonferroni", "holm", "none".
Integer. Number of parallel threads. Default is 1. Higher values significantly speed up computation for large datasets.
Logical. Print progress messages. Default is TRUE.
Method Overview:
SPARK-X uses a variance component score test framework: $$T_g = \frac{n \cdot y_g^T K y_g}{\|y_g\|^2}$$
where:
y_g = expression vector for gene g
K = spatial kernel matrix (derived from coordinates)
n = number of spatial locations
Kernel Types:
Projection kernel: Linear kernel based on scaled coordinates.
Detects gradients and linear spatial trends.
Gaussian kernels: Multiple bandwidth Gaussian RBF kernels.
Detect localized hotspots of different sizes.
Cosine kernels: Multiple frequency periodic kernels.
Detect periodic/oscillating spatial patterns.
P-value Computation:
Individual kernel p-values: Davies' method for quadratic forms
Combined p-value: ACAT (Aggregated Cauchy Association Test)
Advantages:
Non-parametric: No distributional assumptions
Scalable: O(n) complexity, handles millions of cells
Multiple kernels: Detects diverse pattern types
Robust: ACAT combination handles correlated tests
Computational Considerations:
mixture option: ~11x slower than single
Memory: O(n) per gene, efficient for large datasets
Parallelization provides near-linear speedup
Zhu, J., Sun, S., & Zhou, X. (2021). SPARK-X: non-parametric modeling enables scalable and robust detection of spatial expression patterns for large spatial transcriptomic studies. Genome Biology.
CalSVG, ACAT_combine
# Load example data
data(example_svg_data)
expr <- example_svg_data$counts[1:20, ] # Use counts (not log)
coords <- example_svg_data$spatial_coords
# Fast mode with single kernel (no extra dependencies)
results <- CalSVG_SPARKX(expr, coords,
kernel_option = "single",
verbose = FALSE)
head(results)
Run the code above in your browser using DataLab