Detect spatially variable genes using nnSVG, a method based on nearest-neighbor Gaussian processes for scalable spatial modeling.
nnSVG uses nearest-neighbor Gaussian processes (NNGP) to model spatial correlation structure in gene expression. It performs likelihood ratio tests comparing spatial vs. non-spatial models to identify SVGs.
CalSVG_nnSVG(
expr_matrix,
spatial_coords,
X = NULL,
n_neighbors = 10L,
order = c("AMMD", "Sum_coords"),
cov_model = c("exponential", "gaussian", "spherical", "matern"),
adjust_method = "BH",
n_threads = 1L,
verbose = FALSE
)A data.frame with SVG detection results. Columns:
gene: Gene identifier
sigma.sq: Spatial variance estimate (sigma^2)
tau.sq: Nonspatial variance estimate (tau^2, nugget)
phi: Range parameter estimate (controls spatial correlation decay)
prop_sv: Proportion of spatial variance = sigma.sq / (sigma.sq + tau.sq)
loglik: Log-likelihood of spatial model
loglik_lm: Log-likelihood of non-spatial model (linear model)
LR_stat: Likelihood ratio test statistic = -2 * (loglik_lm - loglik)
rank: Rank by LR statistic (1 = highest)
p.value: P-value from chi-squared distribution (df = 2)
p.adj: Adjusted p-value
runtime: Computation time per gene (seconds)
Numeric matrix of gene expression values.
Rows: genes
Columns: spatial locations (spots/cells)
Values: log-normalized counts (e.g., from scran::logNormCounts)
Numeric matrix of spatial coordinates.
Rows: spatial locations (must match columns of expr_matrix)
Columns: x, y coordinates
Optional numeric matrix of covariates to regress out.
Rows: spatial locations (same order as spatial_coords)
Columns: covariates (e.g., batch, cell type indicators)
Default is NULL (intercept-only model).
Integer. Number of nearest neighbors for NNGP model. Default is 10.
5-10: Faster, captures local patterns
15-20: Better likelihood estimates, slower
Values > 15 rarely improve results but increase computation time.
Character string specifying coordinate ordering scheme.
"AMMD" (default): Approximate Maximum Minimum Distance.
Better for most datasets. Requires >= 65 spots.
"Sum_coords": Order by sum of coordinates.
Use for very small datasets (< 65 spots).
Character string specifying the covariance function. Default is "exponential".
"exponential": Most commonly used, computationally stable
"gaussian": Smoother patterns, requires stabilization
"spherical": Finite range correlation
"matern": Flexible smoothness (includes additional nu parameter)
Character string for p-value adjustment. Default is "BH" (Benjamini-Hochberg).
Integer. Number of parallel threads. Default is 1. Set to number of available cores for faster computation.
Logical. Print progress messages. Default is FALSE.
Method Overview:
nnSVG models gene expression as a Gaussian process: $$y = X\beta + \omega + \epsilon$$
where:
y = expression vector
X = covariate matrix, beta = coefficients
omega ~ GP(0, sigma^2 * C(phi)) = spatial random effect
epsilon ~ N(0, tau^2) = non-spatial noise
C(phi) = covariance function with range phi
Nearest-Neighbor Approximation: Full GP has O(n^3) complexity. NNGP approximates using only k nearest neighbors, reducing complexity to O(n * k^3) = O(n).
Statistical Test: Likelihood ratio test comparing:
H0 (null): y = X*beta + epsilon (no spatial effect)
H1 (alternative): y = X*beta + omega + epsilon (with spatial effect)
LR statistic follows chi-squared with df = 2 (testing sigma.sq and phi).
Effect Size: Proportion of spatial variance (prop_sv) measures effect size:
prop_sv near 1: Strong spatial pattern
prop_sv near 0: Little spatial structure
Computational Notes:
Requires BRISC package for NNGP fitting
O(n) complexity per gene with NNGP approximation
Parallelization over genes provides good speedup
Memory: O(n * k) per gene
Weber, L.M. et al. (2023) nnSVG for the scalable identification of spatially variable genes using nearest-neighbor Gaussian processes. Nature Communications.
Datta, A. et al. (2016) Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets. JASA.
CalSVG, BRISC package documentation
# Load example data
data(example_svg_data)
expr <- example_svg_data$logcounts[1:10, ] # Small subset
coords <- example_svg_data$spatial_coords
# \donttest{
# Basic usage (requires BRISC package)
if (requireNamespace("BRISC", quietly = TRUE)) {
results <- CalSVG_nnSVG(expr, coords, verbose = FALSE)
head(results)
}
# }
Run the code above in your browser using DataLab