Learn R Programming

Numbat

Numbat is a haplotype-aware CNV caller from single-cell and spatial transcriptomics data. It integrates signals from gene expression, allelic ratio, and population-derived haplotype information to accurately infer allele-specific CNVs in single cells and reconstruct their lineage relationship.

Numbat can be used to:

  1. Detect allele-specific copy number variations from scRNA-seq and spatial transcriptomics
  2. Differentiate tumor versus normal cells in the tumor microenvironment
  3. Infer the clonal architecture and evolutionary history of profiled tumors.

Numbat does not require paired DNA or genotype data and operates solely on the donor scRNA-seq data (for example, 10x Cell Ranger output). For details of the method, please checkout our paper:

Teng Gao, Ruslan Soldatov, Hirak Sarkar, Adam Kurkiewicz, Evan Biederstedt, Po-Ru Loh, Peter Kharchenko. Haplotype-aware analysis of somatic copy number variations from single-cell transcriptomes. Nature Biotechnology (2022).

Numbat-multiome

Numbat was later extended to multi-modality (single-cell RNA and ATAC) data. Check out the vignette and paper below:

Ruitong Li, Jean-Baptiste Alberge, Tina Keshavarzian, Junko Tsuji, Johan Gustafsson, Mahshid Rahmat, Elizabeth D Lightbody, Stephanie L Deng, Santiago Riviero, Mendy Miller, F Naz Cemre Kalayci, Adrian Wiestner, Clare Sun, Mathieu Lupien, Irene Ghobrial, Erin Parry, Teng Gao, Gad Getz. Numbat-multiome: inferring copy number variations by combining RNA and chromatin accessibility information from single-cell data. Briefings in Bioinformatics (2025).

User Guide

For a complete guide, please see Numbat User Guide.

Questions?

We appreciate your feedback! Please raise a github issue for bugs, questions and new feature requests. For bug reports, please attach full log, error message, input parameters, and ideally a reproducible example (if possible).

Copy Link

Version

Install

install.packages('numbat')

Monthly Downloads

422

Version

1.5.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Teng Gao

Last Published

October 21st, 2025

Functions in numbat (1.5.1)

check_segs_loh

Check the format of a given clonal LOH segment dataframe
chrom_sizes_hg38

chromosome sizes (hg38)
fit_gamma

fit gamma maximum likelihood
check_contam

check inter-individual contamination
get_allele_hmm

Get an allele HMM
check_allele_df

Check the format of a allele dataframe
fit_lnpois

fit a PLN model by maximum likelihood
get_allele_bulk

Aggregate into pseudobulk alelle profile
expand_states

expand multi-allelic CNVs into separate entries in the single-cell posterior dataframe
chrom_sizes_hg19

chromosome sizes (hg19)
genotype

Genotyping main function
exp_hclust

Run smoothed expression-based hclust
generate_postfix

Generate alphabetical postfixes
fit_snp_rate

negative binomial model
choose_ref_cor

choose beest reference for each cell based on correlation
get_haplotype_post

Get phased haplotypes
approx_theta_post

Laplace approximation of the posterior of allelic imbalance theta
get_inter_cm

Helper function to get inter-SNP distance
binary_entropy

calculate entropy for a binary variable
classify_alleles

classify alleles using viterbi and forward-backward
annotate_genes

Annotate genes on allele dataframe
fit_ref_sse

Fit a reference profile from multiple references using constrained least square
calc_exp_LLR

Calculate LLR for an expression HMM
calc_phi_mle_lnpois

Calculate the MLE of expression fold change phi
get_tree_post

Find maximum lilkelihood assignment of mutations on a tree
get_internal_nodes

Helper function to get the internal nodes of a dendrogram and the leafs in each subtree
get_joint_post

get joint posteriors
gexp_roll_example

example smoothed gene expression dataframe
approx_phi_post

Laplace approximation of the posterior of expression fold change phi
count_mat_example

example gene expression count matrix
count_mat_ref

example reference count matrix
fit_bbinom

fit a Beta-Binomial model by maximum likelihood
gaps_hg19

genome gap regions (hg19)
find_common_diploid

Find the common diploid region in a group of pseudobulks
gaps_hg38

genome gap regions (hg38)
plot_psbulk

Plot a pseudobulk HMM profile
gtf_hg19

gene model (hg19)
get_exp_post

compute single-cell expression posteriors
gtf_mm10

gene model (mm10)
check_matrix

Check the format of a count matrix
get_exp_likelihoods

get the single cell expression likelihoods
check_gtf_input

Check and format the GTF input
get_lambdas_bar

Get average reference expressio profile based on single-cell ref choices
mark_tumor_lineage

Mark the tumor lineage of a phylogeny
get_move_opt

Get the least costly mutation reassignment
mut_graph_example

example mutation graph
hc_example

example hclust tree
gtf_hg38

gene model (hg38)
get_nodes_celltree

Get the internal nodes of a dendrogram and the leafs in each subtree
cnv_heatmap

Plot CNV heatmap
phylogeny_example

example single-cell phylogeny
label_genotype

Label the genotypes on a mutation graph
plot_consensus

Plot consensus CNVs
get_move_cost

Get the cost of a mutation reassignment
log_mem

Log memory usage
plot_sc_tree

Plot single-cell smoothed expression magnitude heatmap
switch_prob_cm

predict phase switch probablity as a function of genetic distance
smooth_segs

Smooth the segments after HMM decoding
plot_bulks

Plot a group of pseudobulk HMM profiles
plot_exp_roll

Plot single-cell smoothed expression magnitude heatmap
get_bulk

Aggregate single-cell data into combined bulk expression and allele profile
return_missing_columns

Check the format of a given file
ref_hca_counts

reference expression counts from HCA
theta_hat_roll

Rolling estimate of imbalance level theta
retest_cnv

retest CNVs in a pseudobulk
filter_genes

filter for mutually expressed genes
get_allele_post

get CNV allele posteriors
combine_bulk

Combine allele and expression pseudobulks
theta_hat_seg

Estimate of imbalance level theta in a segment
fill_neu_segs

Fill neutral regions into consensus segments
get_clone_post

Map cells to the phylogeny (or genotypes) based on CNV posteriors
get_exp_bulk

Aggregate into bulk expression profile
preprocess_allele

Preprocess allele data
get_ordered_tips

Get ordered tips from a tree
relevel_chrom

Relevel chromosome column
get_segs_consensus

Extract consensus CNV segments
log_message

Log a message
compute_posterior

Do bayesian averaging to get posteriors
contract_nodes

Merge adjacent set of nodes
detect_clonal_loh

Call clonal LOH using SNP density. Rcommended for cell lines or tumor samples with no normal cells.
df_allele_example

example allele count dataframe
ref_hca

reference expression magnitudes from HCA
test_multi_allelic

test for multi-allelic CNVs
run_numbat

Run workflow to decompose tumor subclones
run_group_hmms

Run multiple HMMs
t_test_pval

T-test wrapper, handles error for insufficient observations
make_group_bulks

Make a group of pseudobulks
get_gtree

Get a tidygraph tree with simplified mutational history.
get_exp_sc

get the single cell expression dataframe
get_segs_neu

get neutral segments from multiple pseudobulks
get_snps

process VCFs into SNP dataframe
pre_likelihood_hmm

HMM object for unit tests
simes_p

Calculate simes' p
phi_hat_seg

Estimate of expression fold change phi in a segment
segs_example

example CNV segments dataframe
phi_hat_roll

Rolling estimate of expression fold change phi
pnorm.range.log

Get the total probability from a region of a normal pdf
label_edges

Annotate the direct upstream or downstream mutations on the edges
viterbi_loh

Viterbi for clonal LOH detection
joint_post_example

example joint single-cell cnv posterior dataframe
vcf_meta

example VCF header
plot_mut_history

Plot mutational history
plot_phylo_heatmap

Plot single-cell CNV calls along with the clonal phylogeny
simplify_history

Simplify the mutational history based on likelihood evidence
retest_bulks

retest consensus segments on pseudobulks
smooth_expression

filtering, normalization and capping
resolve_cnvs

Get unique CNVs from set of segments
transfer_links

Annotate the direct upstream or downstream node on the edges
upgma

UPGMA and WPGMA clustering
analyze_bulk

Call CNVs in a pseudobulk profile using the Numbat joint HMM
annot_haplo_segs

Annotate haplotype segments after HMM decoding
aggregate_counts

Utility function to make reference gene expression profiles
Numbat

Numbat R6 class
Modes

Get the modes of a vector
annot_consensus

Annotate a set of segments on a pseudobulk dataframe
bulk_example

example pseudobulk dataframe
acen_hg38

centromere regions (hg38)
calc_allele_lik

Calculate allele likelihoods
annot_theta_mle

Annotate the theta parameter for each segment
annot_theta_roll

Annotate rolling estimate of imbalance level theta
check_exp_noise

check noise level
annot_ref

example reference cell annotation
check_exp_ref

check the format of lambdas_ref
acen_hg19

centromere regions (hg19)
check_segs_fix

check the format of a given consensus segment dataframe
annot_segs

Annotate copy number segments after HMM decoding
calc_allele_LLR

Calculate LLR for an allele HMM
calc_cluster_dist

Calculate expression distance matrix between cell populatoins