qExportWig: QuasR wig file export

Description

Create a fixed-step wig file from the alignments in the genomic bam files of the ‘QuasR’ project.

Usage

qExportWig(proj, file=NULL, collapseBySample=TRUE, binsize=100L, shift=0L, strand=c("*","+","-"), scaling=TRUE, tracknames=NULL, log2p1=FALSE, colors=c("#1B9E77", "#D95F02", "#7570B3", "#E7298A", "#66A61E", "#E6AB02", "#A6761D", "#666666"), includeSecondary=TRUE, mapqMin=0L, mapqMax=255L, absIsizeMin=NULL, absIsizeMax=NULL, createBigWig=FALSE)

Arguments

proj

A qProject object as returned by qAlign.

file

A character vector with the name(s) for the wig or bigWig file(s) to be generated. Either NULL or a vector of the same length as the number of bam files (for collapseBySample=FALSE) or the number of unique sample names (for collapseBySample=TRUE) in proj. If NULL, the wig or bigWig file names are generated from the names of the genomic bam files or unique sample names with an added “.wig.gz” or “.bw” extension.

collapseBySample

If TRUE, genomic bam files with identical sample name will be combined (summed) into a single track.

binsize

a numerical value defining the bin and step size for the wig or bigWig file(s). binsize will be coerced to integer().

shift

Either a vector or a scalar value defining the read shift (e.g. half of fragment length, see ‘Details’). If length(shift)>1, the length must match the number of bam files in ‘proj’, and the i-th sample will be converted to wig or bigWig using the value in shift[i]. shift will be coerced to integer(). For paired-end alignments, shift will be ignored, and a warning will be issued if it is set to a non-zero value (see ‘Details’).

strand

Only count alignments of strand. The default (“*”) will count all alignments.

scaling

If TRUE or a numerical value, the output values in the wig or bigWig file(s) will be linearly scaled by the total number of aligned reads per sample to improve comparability (see ‘Details’).

tracknames

A character vector with the names of the tracks to appear in the track header. If NULL, the sample names in proj will be used.

log2p1

If TRUE, the number of alignments x per bin will be transformed using the formula log2(x+1).

colors

A character vector with R color names to be used for the tracks.

includeSecondary

if TRUE (the default), include alignments with the secondary bit (0x0100) set in the FLAG.

mapqMin

minimal mapping quality of alignments to be included (mapping quality must be greater than or equal to mapqMin). Valid values are between 0 and 255. The default (0) will include all alignments.

mapqMax

maximal mapping quality of alignments to be included (mapping quality must be less than or equal to mapqMax). Valid values are between 0 and 255. The default (255) will include all alignments.

absIsizeMin

For paired-end experiments, minimal absolute insert size (TLEN field in SAM Spec v1.4) of alignments to be included. Valid values are greater than 0 or NULL (default), which will not apply any minimum insert size filtering.

absIsizeMax

For paired-end experiments, maximal absolute insert size (TLEN field in SAM Spec v1.4) of alignments to be included. Valid values are greater than 0 or NULL (default), which will not apply any maximum insert size filtering.

createBigWig

If TRUE, first a temporary wig file will be created and then converted to BigWig format (file extension “.bw”) using the wigToBigWig function from package rtracklayer.

Value

(invisible) The file name of the generated wig or bigWig file(s).

Details

qExportWig() uses the genome bam files in proj as input to create wig or bigWig files with the number of alignments (pairs) per window of binsize nucleotides. By default (collapseBySample=TRUE), one file per unique sample will be created. If collapseBySample=FALSE, one file per genomic bam file will be created. See http://genome.ucsc.edu/goldenPath/help/wiggle.html for the definition of the wig format, and http://genome.ucsc.edu/goldenPath/help/bigWig.html for the definition of the bigWig format. The genome is tiled with sequential windows of length binsize, and alignments in the bam file are assigned to these windows: Single read alignments are assigned according to their 5'-end coordinate shifted by shift towards the 3'-end (assuming that the 5'-end is the leftmost coordinate for plus-strand alignments, and the rightmost coordinate for minus-strand alignments). Paired-end alignments are assigned according to the base in the middle between the leftmost and rightmost coordinates of the aligned pair of reads. Each pair of reads is only counted once, and not properly paired alignments are ignored. Secondary alignments can be excluded by setting includeSecondary=FALSE. In paired-end experiments, absIsizeMin and absIsizeMax can be used to select alignments based on their insert size (TLEN field in SAM Spec v1.4).

For scaling=TRUE, the number of alignments per bin $n$ for the sample $i$ are linearly scaled to the mean total number of alignments over all samples in proj according to: $n_s = n /N[i] *mean(N)$ where $n_s$ is the scaled number of alignments in the bin and $N$ is a vector with the total number of alignments for each sample. Alternatively, if scaling is set to a positive numerical value $s$, this value is used instead of $mean(N)$, and values are scaled according to: $n_s = n /N[i] *s$.

mapqMin and mapqMax allow to select alignments based on their mapping qualities. mapqMin and mapqMax can take integer values between 0 and 255 and equal to $-10 log10 Pr(mapping position is wrong)$, rounded to the nearest integer. A value 255 indicates that the mapping quality is not available.

If createBigWig=FALSE and file ends with ‘.gz’, the resulting wig file will be compressed using gzip and is suitable for uploading as a custom track to your favorite genome browser (e.g. UCSC or Ensembl).

Examples

Run this code

# copy example data to current working directory
file.copy(system.file(package="QuasR", "extdata"), ".", recursive=TRUE)

# create alignments
sampleFile <- "extdata/samples_chip_single.txt"
genomeFile <- "extdata/hg19sub.fa"
proj <- qAlign(sampleFile, genomeFile)

# export wiggle file
qExportWig(proj, binsize=100L, shift=0L, scaling=TRUE)