parseReads: User configurable efficient assembly of read density maps

Description

Generates density maps for further downstream processing. Constructs a DensityContainer.

Usage

parseReads( filename, spliced=F, read_stranded=0, paired_only=F, readthrough_pairs=F, set_filter=NA, min_quality=0,
		description="NA", extendreads=0, unique_only=F,	max_dups=0, hwindow=1, compression=1, verbose=1 )

Arguments

filename

Character string with the filename of the bam file. The bam file must be sorted according to genomic position.

spliced

This option will mark the object to be treated like a data set with spliced reads. Can be switched off also for spliced experiments for special purposes. If TRUE, switches off extendreads and readthrough_pairs.

read_stranded

0 will read tags from both strands. 1 will skip all tags from the ‘-’ strand and -1 will only utilize tags from the ‘-’ strand

paired_only

If TRUE, any reads which are not members of a proper pair according to the 0x0002 FLAG will be discarded. If FALSE all reads will be used individually.

set_filter

Optional GRanges object or data.frame with similar structure: data.frame(chromosomes,start,end). Providing this filter will limit density maps to these regions.

min_quality

Phred-scaled mapping quality threshold. If 0, all reads will pass this filter.

extendreads

If greater 0, this amount of base pairs will be added into the strand direction of each read during density map generation.

unique_only

If TRUE, only unique reads with no multiple alignments will be used. This filter relies on the aligner to use the corresponding flag (0x100).

max_dups

If greater 0, maximally this amount of reads are allowed per start position and read direction.

description

An optional character string describing the experiment for labeling purposes.

hwindow

A numeric defining the window size used to compute the histogram. This value cannot be bigger than compression

compression

Should be left at the default value. Defines the minimal threshold in base pairs which triggers indexing and collapsing of read free regions. A smaller value leads to faster slicing at the cost of a higher memory footprint.

readthrough_pairs

Currently *experimental*. If TRUE, parseReads will attempt to use the region from the left to the right read of the pair for density map assembly. Requires ISIZE to be set within the BAM/SAM file.

verbose

Verbosity level

Value

S4 DensityContainer

Details

parseReads uses read information of one bam file and scans the entire file read wise. Every read contributes to the density track in a user configurable manner. The resulting track will be stored in indexed integer vectors within a list. Since each score is stored as a unsigned 16bit integer, the scores can only be accessed with one of the slice methods slice1 or sliceN and not directly. As a consequence of the storage format read pile ups greater than 2^16 will be capped and a warning will be issued.

If memory space is limiting, a filter can be supplied which will limit the density track to these regions. Filtered DensityContainer should only be sliced with the same regions used for parsing, since all other positions are set to 0 and can produce artificially low read counts.

Examples

Run this code


exbam<-dir(system.file("extdata", package="TransView"),full=TRUE,patt="bam$")

#store density maps of the whole sam/bam file in test_data
exden.chip<-parseReads(exbam[2],verbose=0)

#display basic information about the content of test.sam 
exden.chip

#all data are easily accessible
test_stat<-tvStats(exden.chip)
test_stat$origin

# histogram of hwindow sized windows
## Not run: histogram(exden.chip)

Run the code above in your browser using DataLab