Learn R Programming

⚠️There's a newer version (1.5.2) of this package.Take me there.

Numbat

Numbat is a haplotype-aware CNV caller from single-cell and spatial transcriptomics data. It integrates signals from gene expression, allelic ratio, and population-derived haplotype information to accurately infer allele-specific CNVs in single cells and reconstruct their lineage relationship.

Numbat can be used to:

  1. Detect allele-specific copy number variations from scRNA-seq and spatial transcriptomics
  2. Differentiate tumor versus normal cells in the tumor microenvironment
  3. Infer the clonal architecture and evolutionary history of profiled tumors.

Numbat does not require paired DNA or genotype data and operates solely on the donor scRNA-seq data (for example, 10x Cell Ranger output). For details of the method, please checkout our paper:

Teng Gao, Ruslan Soldatov, Hirak Sarkar, Adam Kurkiewicz, Evan Biederstedt, Po-Ru Loh, Peter Kharchenko. Haplotype-aware analysis of somatic copy number variations from single-cell transcriptomes. Nature Biotechnology (2022).

User Guide

For a complete guide, please see Numbat User Guide.

Questions?

We appreciate your feedback! Please raise a github issue for bugs, questions and new feature requests. For bug reports, please attach full log, error message, input parameters, and ideally a reproducible example (if possible).

Copy Link

Version

Install

install.packages('numbat')

Monthly Downloads

458

Version

1.3.2-1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Teng Gao

Last Published

June 17th, 2023

Functions in numbat (1.3.2-1)

annot_ref

example reference cell annotation
aggregate_counts

Utility function to make reference gene expression profiles
analyze_bulk

Call CNVs in a pseudobulk profile using the Numbat joint HMM
Numbat

Numbat R6 class
acen_hg19

centromere regions (hg19)
annot_consensus

Annotate a consensus segments on a pseudobulk dataframe
Modes

Get the modes of a vector
annot_segs

Annotate copy number segments after HMM decoding
acen_hg38

centromere regions (hg38)
annot_haplo_segs

Annotate haplotype segments after HMM decoding
bulk_example

example pseudobulk dataframe
binary_entropy

calculate entropy for a binary variable
approx_theta_post

Laplace approximation of the posterior of allelic imbalance theta
approx_phi_post

Laplace approximation of the posterior of expression fold change phi
annotate_genes

Annotate genes on allele dataframe
calc_allele_lik

Calculate allele likelihoods
check_segs_fix

check the format of a given consensus segment dataframe
annot_theta_roll

Annotate rolling estimate of imbalance level theta
annot_theta_mle

Annotate the theta parameter for each segment
check_segs_loh

Check the format of a given clonal LOH segment dataframe
check_exp_ref

check the format of lambdas_ref
calc_cluster_dist

Calculate expression distance matrix between cell populatoins
check_contam

check inter-individual contamination
check_exp_noise

check noise level
calc_trans_mat

Calculate the transition matrix for joint HMM
check_allele_df

Check the format of a allele dataframe
calc_allele_LLR

Calculate LLR for an allele HMM
cnv_heatmap

Plot CNV heatmap
combine_bulk

Combine allele and expression pseudobulks
choose_ref_cor

choose beest reference for each cell based on correlation
fit_ref_sse

Fit a reference profile from multiple references using constrained least square
chrom_sizes_hg19

chromosome sizes (hg19)
check_matrix

Check the format of a count matrix
chrom_sizes_hg38

chromosome sizes (hg38)
classify_alleles

classify alleles using viterbi and forward-backward
get_exp_bulk

Aggregate into bulk expression profile
get_exp_likelihoods

get the single cell expression likelihoods
fit_snp_rate

negative binomial model
compute_posterior

Do bayesian averaging to get posteriors
calc_phi_mle_lnpois

Calculate the MLE of expression fold change phi
calc_exp_LLR

Calculate LLR for an expression HMM
find_common_diploid

Find the common diploid region in a group of pseudobulks
gaps_hg38

genome gap regions (hg38)
filter_genes

filter for mutually expressed genes
get_bulk

Aggregate single-cell data into combined bulk expression and allele profile
generate_postfix

Generate alphabetical postfixes
count_mat_ref

example reference count matrix
count_mat_example

example gene expression count matrix
contract_nodes

Merge adjacent set of nodes
df_allele_example

example allele count dataframe
dpoilog

Returns the density for the Poisson lognormal distribution with parameters mu and sig
exp_hclust

Run smoothed expression-based hclust
dgpois

Density function for a gamma-poisson distribution
fill_neu_segs

Fill neutral regions into consensus segments
expand_states

expand multi-allelic CNVs into separate entries in the single-cell posterior dataframe
detect_clonal_loh

Call clonal LOH using SNP density. Rcommended for cell lines or tumor samples with no normal cells.
dbbinom

Beta-binomial distribution density function A distribution is beta-binomial if p, the probability of success, in a binomial distribution has a beta distribution with shape parameters alpha > 0 and beta > 0 For more details, see extraDistr::dbbinom
fit_gamma

fit gamma maximum likelihood
get_exp_post

compute single-cell expression posteriors
fit_bbinom

fit a Beta-Binomial model by maximum likelihood
get_gtree

Get a tidygraph tree with simplified mutational history.
get_move_opt

Get the least costly mutation reassignment
get_move_cost

Get the cost of a mutation reassignment
get_exp_sc

get the single cell expression dataframe
gtf_mm10

gene model (mm10)
get_haplotype_post

Get phased haplotypes
get_allele_hmm

Get an allele HMM
hc_example

example hclust tree
get_segs_consensus

Extract consensus CNV segments
gaps_hg19

genome gap regions (hg19)
forward_back_allele

Forward-backward algorithm for allele HMM
likelihood_allele

Only compute total log likelihood from an allele HMM
get_allele_post

get CNV allele posteriors
get_joint_post

get joint posteriors
get_lambdas_bar

Get average reference expressio profile based on single-cell ref choices
phi_hat_roll

Rolling estimate of expression fold change phi
phi_hat_seg

Estimate of expression fold change phi in a segment
log_mem

Log memory usage
get_clone_post

Map cells to the phylogeny (or genotypes) based on CNV posteriors
get_segs_neu

get neutral segments from multiple pseudobulks
gtf_hg19

gene model (hg19)
gtf_hg38

gene model (hg38)
plot_sc_tree

Plot single-cell smoothed expression magnitude heatmap
plot_psbulk

Plot a pseudobulk HMM profile
segs_example

example CNV segments dataframe
t_test_pval

T-test wrapper, handles error for insufficient observations
genotype

Genotyping main function
simes_p

Calculate simes' p
test_multi_allelic

test for multi-allelic CNVs
fit_gpois

fit a Gamma-Poisson model by maximum likelihood
fit_lnpois

fit a PLN model by maximum likelihood
get_allele_bulk

Aggregate into pseudobulk alelle profile
get_inter_cm

Helper function to get inter-SNP distance
mark_tumor_lineage

Mark the tumor lineage of a phylogeny
label_edges

Annotate the direct upstream or downstream mutations on the edges
label_genotype

Label the genotypes on a mutation graph
retest_cnv

retest CNVs in a pseudobulk
mut_graph_example

example mutation graph
retest_bulks

retest consensus segments on pseudobulks
return_missing_columns

Check the format of a given file
resolve_cnvs

Get unique CNVs from set of segments
get_nodes_celltree

Get the internal nodes of a dendrogram and the leafs in each subtree
smooth_segs

Smooth the segments after HMM decoding
plot_phylo_heatmap

Plot single-cell CNV calls along with the clonal phylogeny
plot_mut_history

Plot mutational history
get_internal_nodes

Helper function to get the internal nodes of a dendrogram and the leafs in each subtree
switch_prob_cm

predict phase switch probablity as a function of genetic distance
plot_bulks

Plot a group of pseudobulk HMM profiles
run_allele_hmm

allele-only HMM
log_message

Log a message
phylogeny_example

example single-cell phylogeny
make_group_bulks

Make a group of pseudobulks
ref_hca_counts

reference expression counts from HCA
upgma

UPGMA and WPGMA clustering
transfer_links

Annotate the direct upstream or downstream node on the edges
run_group_hmms

Run multiple HMMs
relevel_chrom

Relevel chromosome column
run_numbat

Run workflow to decompose tumor subclones
theta_hat_seg

Estimate of imbalance level theta in a segment
theta_hat_roll

Rolling estimate of imbalance level theta
get_ordered_tips

Get ordered tips from a tree
gexp_roll_example

example smoothed gene expression dataframe
run_joint_hmm

Run joint HMM on a pseudobulk profile
viterbi_joint

Generalized viterbi algorithm for joint HMM
get_tree_post

Find maximum lilkelihood assignment of mutations on a tree
viterbi_loh

Viterbi for clonal LOH detection
joint_post_example

example joint single-cell cnv posterior dataframe
preprocess_allele

Preprocess allele data
ref_hca

reference expression magnitudes from HCA
vcf_meta

example VCF header
l_bbinom

calculate joint likelihood of allele data
get_snps

process VCFs into SNP dataframe
get_trans_probs

Helper function to calculate transition porbabilities cn/phase are sclars, only p_s is vectorized
viterbi_allele

Viterbi algorithm for allele HMM
l_lnpois

calculate joint likelihood of a PLN model
l_gpois

calculate joint likelihood of a gamma-poisson model
plot_consensus

Plot consensus CNVs
plot_exp_roll

Plot single-cell smoothed expression magnitude heatmap
simplify_history

Simplify the mutational history based on likelihood evidence
pre_likelihood_hmm

HMM object for unit tests
pnorm.range.log

Get the total probability from a region of a normal pdf
smooth_expression

filtering, normalization and capping