baf.model.fit: Model fitting using Bayesian inference

Description

Computes the log-posterior probability distribution for the specified range of cellularity and ploidy parameters

Usage

mufreq.model.fit(cellularity = seq(0.3, 1, by = 0.01),
                   ploidy = seq(1, 7, by = 0.1),
                   mc.cores = getOption("mc.cores", 2L), ...)
  baf.model.fit(cellularity = seq(0.3, 1, by = 0.01),
                ploidy = seq(1, 7, by = 0.1),
                mc.cores = getOption("mc.cores", 2L), ...)

Arguments

cellularity

vector of cellularity parameter values to be tested.

ploidy

vector of ploidy parameter values to be tested.

mc.cores

number of cores to use, defined as in mclapply.

...

any argument accepted by mufreq.bayes or baf.bayes.

Value

A list of three items:

ploidy

tested values of the ploidy parameter

cellularity

tested values of the cellularity parameter

lpp

log-posterior probability of each pair of cellularity/ploidy parameters.

Details

baf.model.fit uses the function baf.bayes to infer the log-posterior probability of the model fit using the possible combinations of cellularity and ploidy values provided in the arguments. Similarly mufreq.model.fit fits the mutation/depth ratio model using the function mufreq.bayes. baf.model.fit is generally used to infer cellularity and ploidy on segmented chromosomes, while mufreq.model.fit can also be used to estimate these parameters from mutations, and in addition can be useful to give a rough estimate of sub-clonal fractions. Be aware that the mutation frequency is more sensitive to sampling bias, and is generally more noisy compared to the segmented B-allele frequency.

Examples

Run this code

# NOT RUN {
  
# }
# NOT RUN {
data.file <-  system.file("data", "example.seqz.txt.gz", package = "sequenza")
# read all the chromosomes:
seqz.data  <- read.seqz(data.file)
# Gather genome wide GC-stats from raw file:
gc.stats <- gc.sample.stats(data.file)
gc.vect  <- setNames(gc.stats$raw.mean, gc.stats$gc.values)
# Read only one chromosome:
seqz.data  <- read.seqz(data.file, chr.name = 1)

# Correct the coverage of the loaded chromosome:
seqz.data$adjusted.ratio <- seqz.data$depth.ratio /
                           gc.vect[as.character(seqz.data$GC.percent)]
# Select the heterozygous positions
seqz.hom  <- seqz.data$zygosity.normal == 'hom'
seqz.het  <- seqz.data[!seqz.hom, ]
# Detect breakpoints
breaks <- find.breaks(seqz.het, gamma = 80, kmin = 10, baf.thres = c(0, 0.5))
# use heterozygous and homozygous position to measure segment values
seg.s1 <- segment.breaks(seqz.data, breaks = breaks)

# filter out small ambiguous segments, and conveniently weight the segments by size:
seg.filtered <- seg.s1[(seg.s1$end.pos - seg.s1$start.pos) > 10e6, ]
weights.seg  <- 150 + round((seg.filtered$end.pos -
                             seg.filtered$start.pos) / 1e6, 0)
# get the genome wide mean of the normalized depth ratio:
avg.depth.ratio <- mean(gc.stats$adj[,2])
# run the BAF model fit

CP <- baf.model.fit(Bf = seg.filtered$Bf, depth.ratio = seg.filtered$depth.ratio,
                    weight.ratio = weights.seg,
                    weight.Bf = weights.seg,
                    avg.depth.ratio = avg.depth.ratio,
                    cellularity = seq(0.1,1,0.01),
                    ploidy = seq(0.5,3,0.05))

confint <- get.ci(CP)
ploidy   <- confint$max.ploidy
cellularity <- confint$max.cellularity

   
# }