Learn R Programming

sequenza (version 2.1.2)

gc.sample.stats: Normalize depth ratio values for GC-content bias

Description

Detects and bias in the depth ratio values driven by varying GC-content.

Usage

gc.sample.stats(file, gz = TRUE)
gc.norm(x, gc)

Arguments

file

name of a file in the seqz format.

x

vector of values to be normalized by GC-content, typically depth ratio values.

gc

vector of relative GC-content values for x.

gz

logical. If TRUE (the default) the function expects a gzipped file.

Value

A list with the following elements:

raw

quartiles of x for each value of gc

adj

median-normalized values of raw

gc.values

vector of different GC-content values observed

raw.mean

mean of x for each value of gc

raw.median

median x for each value of gc

file.metrics

only from gc.sample.stats.

Details

gc.norm detects bias in x driven by gc. Specifically, for each value of gc, summary statistics are calculated for the corresponding values of x. These statistics can then be used to normalize x for gc.

gc.sample.stats extracts depth ratio and GC-content from an seqz file, and then uses gc.norm on the results.

Examples

Run this code
# NOT RUN {
  
# }
# NOT RUN {
data.file <-  system.file("data", "example.seqz.txt.gz", package = "sequenza")
# read all the chromosomes:
seqz.data  <- read.seqz(data.file)
# Normalize coverage by GC-content
gc.stats <- gc.norm(x = seqz.data$depth.ratio,
                    gc = seqz.data$GC.percent)
gc.vect  <- setNames(gc.stats$raw.mean, gc.stats$gc.values)
seqz.data$adjusted.ratio <- seqz.data$depth.ratio /
                           gc.vect[as.character(seqz.data$GC.percent)]

# Alternatively gather genome wide GC-stats from raw file:
gc.stats <- gc.sample.stats(data.file)
gc.vect  <- setNames(gc.stats$raw.mean, gc.stats$gc.values)
# Read only one chromosome:
seqz.data  <- read.seqz(data.file, chr.name = 12)
# Correct the coverage of the loaded chromosome:
seqz.data$adjusted.ratio <- seqz.data$depth.ratio /
                           gc.vect[as.character(seqz.data$GC.percent)]

   
# }

Run the code above in your browser using DataLab