Learn R Programming

ctDNAtools (version 0.4.0)

create_background_panel: Creates a background panel from a list of bam files

Description

This function scans the targets regions in the list of bam files, and reports the number of reference, non-reference reads for each loci in addition to the non-reference (VAF) allele frequency. Loci with VAF higher than vaf_threshold are masked with NA.

Usage

create_background_panel(bam_list, targets, reference,
  vaf_threshold = 0.05, bam_list_tags = rep("", length(bam_list)),
  min_base_quality = 10, max_depth = 1e+05, min_mapq = 20,
  substitution_specific = TRUE)

Arguments

bam_list

A character vector of paths to bam files.

targets

The targets data frame must have the columns chr, start and end.

reference

The reference genome as BSgenome object.

vaf_threshold

Loci with the fraction of non-reference reads above this value are masked with NA.

bam_list_tags

RG tags for the list of bam files. By default, the whole bam file will be used.

min_base_quality

The minimum base quality to count a read for a loci.

max_depth

Maximum depth for the pileup

min_mapq

The minimum mapping quality to count a read for a loci

substitution_specific

logical, whether to have the loci by substitutions.

Value

A named list having depth, alt and vaf data frames. Each has the same order of loci in rows and the input samples in columns.

Details

Extracts the depth, variant allele counts and variant allele frequency (VAF) for each genomic position in the input targets across a panel of bam files (e.g. from healthy samples to represent only technical noise). The extracted information can be fed to create_black_list in order to extract a black listed loci according to defined criteria

The function support two modes, either loci-specific regardless of the basepair substitution, or substitution-specific where each substitution class (e.g. C>T, C>G) are quantified separately. This behavior is controlled by the substitution_specific parameter.

VAF above vaf_threshold parameters are masked with NA, to exclude real SNPs/mutations.

Since this function can take a long time when the bam_list comprises a large number of bam files, the function supports multi-threading using the furrr and future R packages. All you need to do is call 'plan(multiprocess)' or other multi-threading strategies before calling this function.

See Also

create_black_list test_ctDNA

Examples

Run this code
# NOT RUN {
## Load example data
data("targets", package = "ctDNAtools")
bamN1 <- system.file("extdata", "N1.bam", package = "ctDNAtools")
bamN2 <- system.file("extdata", "N2.bam", package = "ctDNAtools")
bamN3 <- system.file("extdata", "N3.bam", package = "ctDNAtools")

## Use human reference genome from BSgenome.Hsapiens.UCSC.hg19 library
suppressMessages(library(BSgenome.Hsapiens.UCSC.hg19))

## Use a black list based on loci
bg_panel <- create_background_panel(
  bam_list = c(bamN1, bamN2, bamN3),
  targets = targets, reference = BSgenome.Hsapiens.UCSC.hg19,
  substitution_specific = FALSE
)

bl1 <- create_black_list(bg_panel,
  mean_vaf_quantile = 0.99,
  min_samples_one_read = 2, min_samples_two_reads = 1
)

## Use a substitution-specific black list
bg_panel <- create_background_panel(
  bam_list = c(bamN1, bamN2, bamN3),
  targets = targets, reference = BSgenome.Hsapiens.UCSC.hg19,
  substitution_specific = TRUE
)

bl2 <- create_black_list(bg_panel,
  mean_vaf_quantile = 0.99,
  min_samples_one_read = 2, min_samples_two_reads = 1
)

## Multi-threading (commented)
## library(furrr)
## plan(multiprocess)
## plan(multiprocess, workers = 3)
bg_panel <- create_background_panel(
  bam_list = c(bamN1, bamN2, bamN3),
  targets = targets, reference = BSgenome.Hsapiens.UCSC.hg19,
  substitution_specific = TRUE
)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab