Learn R Programming

sequenza (version 1.0.5)

gc.sample.stats: Normalize depth ratio values for GC-content bias

Description

Detects and bias in the depth ratio values driven by varying GC-content.

Usage

gc.sample.stats(file, gz = TRUE)
gc.norm(x, gc)

Arguments

file
name of a file in the ABfreq format.
x
vector of values to be normalized by GC-content, typically depth ratio values.
gc
vector of relative GC-content values for x.
gz
logical. If TRUE (the default) the function expects a gzipped file.

Value

  • A list with the following elements:
  • rawquartiles of x for each value of gc
  • adjmedian-normalized values of raw
  • gc.valuesvector of different GC-content values observed
  • raw.meanmean of x for each value of gc
  • raw.medianmedian x for each value of gc
  • file.metricsonly from gc.sample.stats.

Details

gc.norm detects bias in x driven by gc. Specifically, for each value of gc, summary statistics are calculated for the corresponding values of x. These statistics can then be used to normalize x for gc.

gc.sample.stats extracts depth ratio and GC-content from an ABfreq file, and then uses gc.norm on the results.

Examples

Run this code
data.file <-  system.file("data", "abf.data.abfreq.txt.gz", package = "sequenza")
# read all the chromosomes:
abf.data  <- read.abfreq(data.file)
# Normalize coverage by GC-content
gc.stats <- gc.norm(x = abf.data$depth.ratio,
                    gc = abf.data$GC.percent)
gc.vect  <- setNames(gc.stats$raw.mean, gc.stats$gc.values)
abf.data$adjusted.ratio <- abf.data$depth.ratio /
                           gc.vect[as.character(abf.data$GC.percent)]

# Alternatively gather genome wide GC-stats from raw file:
gc.stats <- gc.sample.stats(data.file)
gc.vect  <- setNames(gc.stats$raw.mean, gc.stats$gc.values)
# Read only one chromosome:
abf.data  <- read.abfreq(data.file, chr.name = 12)
# Correct the coverage of the loaded chromosome:
abf.data$adjusted.ratio <- abf.data$depth.ratio /
                           gc.vect[as.character(abf.data$GC.percent)]

Run the code above in your browser using DataLab