Learn R Programming

⚠️There's a newer version (1.5.2) of this package.Take me there.

Numbat

Numbat is a haplotype-aware CNV caller from single-cell and spatial transcriptomics data. It integrates signals from gene expression, allelic ratio, and population-derived haplotype information to accurately infer allele-specific CNVs in single cells and reconstruct their lineage relationship.

Numbat can be used to:

Detect allele-specific copy number variations from scRNA-seq and spatial transcriptomics
Differentiate tumor versus normal cells in the tumor microenvironment
Infer the clonal architecture and evolutionary history of profiled tumors.

Numbat does not require paired DNA or genotype data and operates solely on the donor scRNA-seq data (for example, 10x Cell Ranger output). For details of the method, please checkout our paper:

Teng Gao, Ruslan Soldatov, Hirak Sarkar, Adam Kurkiewicz, Evan Biederstedt, Po-Ru Loh, Peter Kharchenko. Haplotype-aware analysis of somatic copy number variations from single-cell transcriptomes. Nature Biotechnology (2022).

Numbat-multiome

Numbat was later extended to multi-modality (single-cell RNA and ATAC) data. Check out the vignette and paper below:

Ruitong Li, Jean-Baptiste Alberge, Tina Keshavarzian, Junko Tsuji, Johan Gustafsson, Mahshid Rahmat, Elizabeth D Lightbody, Stephanie L Deng, Santiago Riviero, Mendy Miller, F Naz Cemre Kalayci, Adrian Wiestner, Clare Sun, Mathieu Lupien, Irene Ghobrial, Erin Parry, Teng Gao, Gad Getz. Numbat-multiome: inferring copy number variations by combining RNA and chromatin accessibility information from single-cell data. Briefings in Bioinformatics (2025).

User Guide

For a complete guide, please see Numbat User Guide.

Questions?

We appreciate your feedback! Please raise a github issue for bugs, questions and new feature requests. For bug reports, please attach full log, error message, input parameters, and ideally a reproducible example (if possible).

Copy Link

Version

Install

install.packages('numbat')

Monthly Downloads

672

Version

1.5.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Repository

https://github.com/kharchenkolab/numbat/

Homepage

https://kharchenkolab.github.io/numbat/

Maintainer

Teng Gao

Last Published

October 21st, 2025

Functions in numbat (1.5.1)

Check the format of a given clonal LOH segment dataframe

chrom_sizes_hg38

chromosome sizes (hg38)

fit gamma maximum likelihood

check inter-individual contamination

Get an allele HMM

check_allele_df

Check the format of a allele dataframe

fit a PLN model by maximum likelihood

get_allele_bulk

Aggregate into pseudobulk alelle profile

expand multi-allelic CNVs into separate entries in the single-cell posterior dataframe

chrom_sizes_hg19

chromosome sizes (hg19)

Genotyping main function

Run smoothed expression-based hclust

generate_postfix

Generate alphabetical postfixes

negative binomial model

choose beest reference for each cell based on correlation

get_haplotype_post

Get phased haplotypes

approx_theta_post

Laplace approximation of the posterior of allelic imbalance theta

Helper function to get inter-SNP distance

calculate entropy for a binary variable

classify_alleles

classify alleles using viterbi and forward-backward

Annotate genes on allele dataframe

Fit a reference profile from multiple references using constrained least square

Calculate LLR for an expression HMM

calc_phi_mle_lnpois

Calculate the MLE of expression fold change phi

Find maximum lilkelihood assignment of mutations on a tree

get_internal_nodes

Helper function to get the internal nodes of a dendrogram and the leafs in each subtree

get joint posteriors

gexp_roll_example

example smoothed gene expression dataframe

approx_phi_post

Laplace approximation of the posterior of expression fold change phi

count_mat_example

example gene expression count matrix

example reference count matrix

fit a Beta-Binomial model by maximum likelihood

genome gap regions (hg19)

find_common_diploid

Find the common diploid region in a group of pseudobulks

genome gap regions (hg38)

Plot a pseudobulk HMM profile

gene model (hg19)

compute single-cell expression posteriors

gene model (mm10)

Check the format of a count matrix

get_exp_likelihoods

get the single cell expression likelihoods

check_gtf_input

Check and format the GTF input

get_lambdas_bar

Get average reference expressio profile based on single-cell ref choices

mark_tumor_lineage

Mark the tumor lineage of a phylogeny

Get the least costly mutation reassignment

mut_graph_example

example mutation graph

example hclust tree

gene model (hg38)

get_nodes_celltree

Get the internal nodes of a dendrogram and the leafs in each subtree

Plot CNV heatmap

phylogeny_example

example single-cell phylogeny

Label the genotypes on a mutation graph

Plot consensus CNVs

Get the cost of a mutation reassignment

Log memory usage

Plot single-cell smoothed expression magnitude heatmap

predict phase switch probablity as a function of genetic distance

Smooth the segments after HMM decoding

Plot a group of pseudobulk HMM profiles

Plot single-cell smoothed expression magnitude heatmap

Aggregate single-cell data into combined bulk expression and allele profile

return_missing_columns

Check the format of a given file

reference expression counts from HCA

Rolling estimate of imbalance level theta

retest CNVs in a pseudobulk

filter for mutually expressed genes

get_allele_post

get CNV allele posteriors

Combine allele and expression pseudobulks

Estimate of imbalance level theta in a segment

Fill neutral regions into consensus segments

Map cells to the phylogeny (or genotypes) based on CNV posteriors

Aggregate into bulk expression profile

preprocess_allele

Preprocess allele data

get_ordered_tips

Get ordered tips from a tree

Relevel chromosome column

get_segs_consensus

Extract consensus CNV segments

compute_posterior

Do bayesian averaging to get posteriors

Merge adjacent set of nodes

detect_clonal_loh

Call clonal LOH using SNP density. Rcommended for cell lines or tumor samples with no normal cells.

df_allele_example

example allele count dataframe

reference expression magnitudes from HCA

test_multi_allelic

test for multi-allelic CNVs

Run workflow to decompose tumor subclones

Run multiple HMMs

T-test wrapper, handles error for insufficient observations

make_group_bulks

Make a group of pseudobulks

Get a tidygraph tree with simplified mutational history.

get the single cell expression dataframe

get neutral segments from multiple pseudobulks

process VCFs into SNP dataframe

pre_likelihood_hmm

HMM object for unit tests

Calculate simes' p

Estimate of expression fold change phi in a segment

example CNV segments dataframe

Rolling estimate of expression fold change phi

pnorm.range.log

Get the total probability from a region of a normal pdf

Annotate the direct upstream or downstream mutations on the edges

Viterbi for clonal LOH detection

joint_post_example

example joint single-cell cnv posterior dataframe

example VCF header

plot_mut_history

Plot mutational history

plot_phylo_heatmap

Plot single-cell CNV calls along with the clonal phylogeny

simplify_history

Simplify the mutational history based on likelihood evidence

retest consensus segments on pseudobulks

smooth_expression

filtering, normalization and capping

Get unique CNVs from set of segments

Annotate the direct upstream or downstream node on the edges

UPGMA and WPGMA clustering

Call CNVs in a pseudobulk profile using the Numbat joint HMM

annot_haplo_segs

Annotate haplotype segments after HMM decoding

aggregate_counts

Utility function to make reference gene expression profiles

Numbat R6 class

Get the modes of a vector

annot_consensus

Annotate a set of segments on a pseudobulk dataframe

example pseudobulk dataframe

centromere regions (hg38)

calc_allele_lik

Calculate allele likelihoods

annot_theta_mle

Annotate the theta parameter for each segment

annot_theta_roll

Annotate rolling estimate of imbalance level theta

check_exp_noise

check noise level

example reference cell annotation

check the format of lambdas_ref

centromere regions (hg19)

check the format of a given consensus segment dataframe

Annotate copy number segments after HMM decoding

calc_allele_LLR

Calculate LLR for an allele HMM

calc_cluster_dist

Calculate expression distance matrix between cell populatoins