Learn R Programming

Numbat

Numbat is a haplotype-aware CNV caller from single-cell and spatial transcriptomics data. It integrates signals from gene expression, allelic ratio, and population-derived haplotype information to accurately infer allele-specific CNVs in single cells and reconstruct their lineage relationship.

Numbat can be used to:

  1. Detect allele-specific copy number variations from scRNA-seq and spatial transcriptomics
  2. Differentiate tumor versus normal cells in the tumor microenvironment
  3. Infer the clonal architecture and evolutionary history of profiled tumors.

Numbat does not require paired DNA or genotype data and operates solely on the donor scRNA-seq data (for example, 10x Cell Ranger output). For details of the method, please checkout our paper:

Teng Gao, Ruslan Soldatov, Hirak Sarkar, Adam Kurkiewicz, Evan Biederstedt, Po-Ru Loh, Peter Kharchenko. Haplotype-aware analysis of somatic copy number variations from single-cell transcriptomes. Nature Biotechnology (2022).

Numbat-multiome

Numbat was later extended to multi-modality (single-cell RNA and ATAC) data. Check out the vignette and paper below:

Ruitong Li, Jean-Baptiste Alberge, Tina Keshavarzian, Junko Tsuji, Johan Gustafsson, Mahshid Rahmat, Elizabeth D Lightbody, Stephanie L Deng, Santiago Riviero, Mendy Miller, F Naz Cemre Kalayci, Adrian Wiestner, Clare Sun, Mathieu Lupien, Irene Ghobrial, Erin Parry, Teng Gao, Gad Getz. Numbat-multiome: inferring copy number variations by combining RNA and chromatin accessibility information from single-cell data. Briefings in Bioinformatics (2025).

User Guide

For a complete guide, please see Numbat User Guide.

Questions?

We appreciate your feedback! Please raise a github issue for bugs, questions and new feature requests. For bug reports, please attach full log, error message, input parameters, and ideally a reproducible example (if possible).

Copy Link

Version

Install

install.packages('numbat')

Monthly Downloads

416

Version

1.5.2

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Teng Gao

Last Published

February 4th, 2026

Functions in numbat (1.5.2)

detect_clonal_loh

Call clonal LOH using SNP density. Rcommended for cell lines or tumor samples with no normal cells.
calc_phi_mle_lnpois

Calculate the MLE of expression fold change phi
expand_states

expand multi-allelic CNVs into separate entries in the single-cell posterior dataframe
gaps_hg19

genome gap regions (hg19)
gaps_hg38

genome gap regions (hg38)
plot_mut_history

Plot mutational history
gtf_hg19

gene model (hg19)
get_internal_nodes

Helper function to get the internal nodes of a dendrogram and the leafs in each subtree
get_joint_post

get joint posteriors
gtf_hg38

gene model (hg38)
get_snps

process VCFs into SNP dataframe
get_segs_neu

get neutral segments from multiple pseudobulks
get_gtree

Get a tidygraph tree with simplified mutational history.
check_exp_noise

check noise level
get_exp_sc

get the single cell expression dataframe
get_exp_likelihoods

get the single cell expression likelihoods
get_move_cost

Get the cost of a mutation reassignment
filter_genes

filter for mutually expressed genes
count_mat_example

example gene expression count matrix
get_allele_bulk

Aggregate into pseudobulk alelle profile
count_mat_ref

example reference count matrix
genotype

Genotyping main function
get_exp_post

compute single-cell expression posteriors
gtf_mm10

gene model (mm10)
log_message

Log a message
hc_example

example hclust tree
check_exp_ref

check the format of lambdas_ref
get_clone_post

Map cells to the phylogeny (or genotypes) based on CNV posteriors
plot_phylo_heatmap

Plot single-cell CNV calls along with the clonal phylogeny
plot_psbulk

Plot a pseudobulk HMM profile
fit_gamma

fit gamma maximum likelihood
contract_nodes

Merge adjacent set of nodes
get_move_opt

Get the least costly mutation reassignment
generate_postfix

Generate alphabetical postfixes
compute_posterior

Do bayesian averaging to get posteriors
plot_sc_tree

Plot single-cell smoothed expression magnitude heatmap
df_allele_example

example allele count dataframe
fit_bbinom

fit a Beta-Binomial model by maximum likelihood
get_lambdas_bar

Get average reference expressio profile based on single-cell ref choices
fit_ref_sse

Fit a reference profile from multiple references using constrained least square
find_common_diploid

Find the common diploid region in a group of pseudobulks
plot_exp_roll

Plot single-cell smoothed expression magnitude heatmap
retest_bulks

retest consensus segments on pseudobulks
fit_snp_rate

negative binomial model
pre_likelihood_hmm

HMM object for unit tests
fill_neu_segs

Fill neutral regions into consensus segments
resolve_cnvs

Get unique CNVs from set of segments
phi_hat_seg

Estimate of expression fold change phi in a segment
theta_hat_roll

Rolling estimate of imbalance level theta
get_allele_hmm

Get an allele HMM
log_mem

Log memory usage
retest_cnv

retest CNVs in a pseudobulk
label_genotype

Label the genotypes on a mutation graph
get_bulk

Aggregate single-cell data into combined bulk expression and allele profile
upgma

UPGMA and WPGMA clustering
plot_consensus

Plot consensus CNVs
ref_hca_counts

reference expression counts from HCA
get_segs_consensus

Extract consensus CNV segments
return_missing_columns

Check the format of a given file
theta_hat_seg

Estimate of imbalance level theta in a segment
joint_post_example

example joint single-cell cnv posterior dataframe
simes_p

Calculate simes' p
get_allele_post

get CNV allele posteriors
get_haplotype_post

Get phased haplotypes
get_inter_cm

Helper function to get inter-SNP distance
phi_hat_roll

Rolling estimate of expression fold change phi
phylogeny_example

example single-cell phylogeny
smooth_segs

Smooth the segments after HMM decoding
get_ordered_tips

Get ordered tips from a tree
test_multi_allelic

test for multi-allelic CNVs
get_nodes_celltree

Get the internal nodes of a dendrogram and the leafs in each subtree
gexp_roll_example

example smoothed gene expression dataframe
label_edges

Annotate the direct upstream or downstream mutations on the edges
get_exp_bulk

Aggregate into bulk expression profile
make_group_bulks

Make a group of pseudobulks
get_tree_post

Find maximum lilkelihood assignment of mutations on a tree
smooth_expression

filtering, normalization and capping
relevel_chrom

Relevel chromosome column
simplify_history

Simplify the mutational history based on likelihood evidence
t_test_pval

T-test wrapper, handles error for insufficient observations
pnorm.range.log

Get the total probability from a region of a normal pdf
mut_graph_example

example mutation graph
mark_tumor_lineage

Mark the tumor lineage of a phylogeny
switch_prob_cm

predict phase switch probablity as a function of genetic distance
segs_example

example CNV segments dataframe
transfer_links

Annotate the direct upstream or downstream node on the edges
plot_bulks

Plot a group of pseudobulk HMM profiles
run_group_hmms

Run multiple HMMs
ref_hca

reference expression magnitudes from HCA
viterbi_loh

Viterbi for clonal LOH detection
run_numbat

Run workflow to decompose tumor subclones
preprocess_allele

Preprocess allele data
vcf_meta

example VCF header
annot_haplo_segs

Annotate haplotype segments after HMM decoding
analyze_bulk

Call CNVs in a pseudobulk profile using the Numbat joint HMM
aggregate_counts

Utility function to make reference gene expression profiles
Numbat

Numbat R6 class
annotate_genes

Annotate genes on allele dataframe
check_gtf_input

Check and format the GTF input
calc_allele_lik

Calculate allele likelihoods
annot_segs

Annotate copy number segments after HMM decoding
approx_phi_post

Laplace approximation of the posterior of expression fold change phi
approx_theta_post

Laplace approximation of the posterior of allelic imbalance theta
bulk_example

example pseudobulk dataframe
calc_exp_LLR

Calculate LLR for an expression HMM
binary_entropy

calculate entropy for a binary variable
classify_alleles

classify alleles using viterbi and forward-backward
check_allele_df

Check the format of a allele dataframe
acen_hg19

centromere regions (hg19)
choose_ref_cor

choose beest reference for each cell based on correlation
check_segs_loh

Check the format of a given clonal LOH segment dataframe
annot_ref

example reference cell annotation
calc_allele_LLR

Calculate LLR for an allele HMM
check_segs_fix

check the format of a given consensus segment dataframe
chrom_sizes_hg38

chromosome sizes (hg38)
calc_cluster_dist

Calculate expression distance matrix between cell populatoins
acen_hg38

centromere regions (hg38)
annot_consensus

Annotate a set of segments on a pseudobulk dataframe
check_matrix

Check the format of a count matrix
annot_theta_mle

Annotate the theta parameter for each segment
Modes

Get the modes of a vector
annot_theta_roll

Annotate rolling estimate of imbalance level theta
exp_hclust

Run smoothed expression-based hclust
chrom_sizes_hg19

chromosome sizes (hg19)
check_contam

check inter-individual contamination
cnv_heatmap

Plot CNV heatmap
combine_bulk

Combine allele and expression pseudobulks
fit_lnpois

fit a PLN model by maximum likelihood