PurBayes: Bayesian Estimation of Tumor Purity and Clonality

Description

PurBayes is an iterative Bayesian algorithm which simultaneously estimates tumor purity and clonality using finite mixture models, using the MCMC software JAGS to obtain posterior samples for inference. Using a penalized deviance criterion, PurBayes iteratively fits models increasing in variant population count until an optimal fit is achieved.

Usage

PurBayes(N, Y, M=NULL, Z=NULL, pop.max=5, prior=NULL, burn.in=50000, n.post = 10000, fn.jags = "PB.jags", plot = FALSE)

Arguments

numeric vector of total reads for each somatic mutation from the tumor tissue NGS data

numeric vector of mutant allele supporting read counts for each somatic mutation from the tumor tissue NGS data

optional numeric vector of total reads for germline heterogyous variants. PurBayes uses these to estimate non-reference allele mapping rate to account for mapping bias

optional numeric vector of alternate allele reads for germline heterozygous variants, corresponding to M

pop.max

Maximum number of variant populations allowed in the iterative modeling procedure. Defaults to 5.

prior

Optional prior distribution for $\lambda_J$ under the homogenenous tumor model. If NULL, defaults to Uniform(0,1). WARNING: This must be provided as a character string written within the JAGS modeling language.

burn.in

Number of MCMC draws that are excluded as a burn-in. Defaults to 50000.

n.post

Number of MCMC draws that are sampled for posterior inference. Defaults to 10000.

fn.jags

File location and name to which write.PB generates the appropriate JAGS model file. Defaults to 'PB.jags' in the current working directory.

plot

If plot=TRUE, then plot.PurBayes is called to generate a visual representation of the data along with the model fit by PurBayes. Defaults to FALSE.

Value

n.pop: Numeric scalar corresponding to number of variant populations detected by PurBayes
PB.post: mcmc.list object corresponding to posterior samples of PurBayes model parameters. This necessarily includes pur, the tumor purity. If n.pop>1, posterior samples of $\kappa_j$ and $\lambda_j$ for $j = 1,...,J$ are also included.
dev.mat: a matrix of the penalized expected deviance results from the model selection procedure. This includes the penalized expected deviance, the difference in PED with the reference model, and the standard error of that difference.
which.ref: indicates which fitted model is the reference model in the penalized expected deviance analysis. This will either be the fitted model with the minimal PED.
jag.fits: List of learned JAGS models (object class jags) fit in the model selection process

Details

For a given tumor purity level $\lambda$ PurBayes assumes a binomial-binomial mixture model for the tumor sequence reads which support the alternate allele, $Y_i^t \sim Bin(N_i,\lambda/2)$. This model is fit to the data under the assumption of tumor homogeneity. PurBayes also supports the possibility of intra-tumor heterogeneity, whereby the tumor tissue is comprised of additional subclonal variant populations, each with its own 'purity', $\lambda_j<\lambda$, for="" $j="1,...,J-1$" and="" $\lambda_j="" \equiv="" \lambda$.<="" p="">

The probability that a given variant corresponds to the $j^{th}$ population is given by $\kappa_j$, and $\bm{\kappa}=(\kappa_1,\ldots,\kappa_J)$ follows a dirichlet prior such that $\pi(\bm{\kappa})\sim Dirichlet(\alpha_1,\,\ldots,\alpha_J)$ for a given variant population quantity $J$. PurBayes applies a diffuse prior on $\bm{\kappa}$, such that $\alpha_1=\ldots=\alpha_J=1$. While the user may specify a particular prior for $\lambda$ under a homogeneous tumor, PurBayes defaults to $\pi(\lambda_j) \sim Uniform(0,1)$ for all j, and uses a sort function to avoid label switching.

The optimality criterion used for model selection with regard to size of $J$ is based upon the penalized expected deviance (Plummer, 2008) In instances where the optimism cannot be determined, it is approximated by twice the pD value (along with a warning this approximation is being used).

References

Plummer, M. (2008) Penalized loss functions for Bayesian model comparison. Biostatistics doi: 10.1093/biostatistics/kxm049

Examples

Run this code

#Homogeneous tumor example
N.var<-20
N<-round(runif(N.var,20,200))
lambda<-0.75
Y<-rbinom(N.var,N,lambda/2)
## Not run: PB.hom<-PurBayes(N,Y)

#Heterogeneous tumor example - 1 subclonal population
N.var<-20
N<-round(runif(N.var,20,200))
lambda.1<-0.75
lambda.2<-0.25
lambda<-c(rep(lambda.1,10),rep(lambda.2,10))
Y<-rbinom(N.var,N,lambda/2)
## Not run: PB.het<-PurBayes(N,Y)

Run the code above in your browser using DataLab