mutation.table: Identify mutations

Description

This function extracts positions from an seqz file that differ from the normal genome, applying various filters.

Usage

mutation.table(seqz.tab, mufreq.treshold = 0.15, min.reads = 40, min.reads.normal = 10,
               max.mut.types = 3, min.type.freq = 0.9, min.fw.freq = 0, segments = NULL)

Arguments

seqz.tab

an seqz table, as output from read.seqz.

mufreq.treshold

mutation frequency threshold.

min.reads

minimum number of reads above the quality threshold to accept the mutation call.

min.reads.normal

minimum number of reads used to determine the genotype in the normal sample.

max.mut.types

maximum number of different base substitutions per position. Integer from 1 to 3 (since there are only 4 different bases). Default is 3, to accept “noisy” mutation calls.

min.type.freq

minimum frequency of aberrant types.

min.fw.freq

minimum frequency of variant reads detected in the forward strand. Setting it to 0, all the variant calls with strand frequency in the interval outside 0 and 1, margin not comprised, would be discarded.

segments

if specified, the values of depth ratio would be taken from the segments rather than from the raw data.

Value

A data frame, which in addition to some of the columns of the seqz table, contains the following two columns:

the mutation frequency

mutation

a character representation of the mutation. For example, a mutation from A in the normal to G in the tumor is annotated as A>G.

Details

Calling mutations in impure tumor samples is a difficult task, because the degree of contamination by normal cells affects the measured mutation frequency. In highly impure samples, where the normal cells comprise the major component of the sample, mutations can be so diluted that it can be difficult to distinguish them from sequencing errors.

The function mutation.table tries to separate true mutations from sequencing errors, based on the given threshold. In samples with low contamination, it should even be possible to catch sub-clonal mutations using this function.

This function identifes only those mutations occuring in positions that are homozygous in the normal genome.

Examples

Run this code

# NOT RUN {
   
# }
# NOT RUN {
data.file <-  system.file("extdata", "example.seqz.txt.gz", package = "sequenza")
seqz.data  <- read.seqz(data.file)

## Normalize coverage by GC-content
gc.stats <- gc.norm(x = seqz.data$depth.ratio,
                    gc = seqz.data$GC.percent)
gc.vect  <- setNames(gc.stats$raw.mean, gc.stats$gc.values)
seqz.data$adjusted.ratio <- seqz.data$depth.ratio /
                           gc.vect[as.character(seqz.data$GC.percent)]

## Extract mutations
mut.tab   <- mutation.table(seqz.data, mufreq.treshold = 0.15,
                            min.reads = 40, max.mut.types = 1,
                            min.type.freq = 0.9)
mut.tab <- na.exclude(mut.tab)
   
# }

Run the code above in your browser using DataLab