multiread_CSEMDispatch: Multiread scoring CSEM dispatch

Description

This script aims to dispatch the scoring of multi-read aligned reads according to the CSEM algorithm developped by Chung et al. (see "Discovering Transcription Factor Binding Sites of Genomes with Multi-Read Analysis of ChIP-seq Data" (2011) PLoS Computational Biology).

Usage

multiread_CSEMDispatch(alignedFile, 
                               outputFolder, 
                               referenceFile,
                               window_size=101, 
                               iteration_number=200, 
                               incrArtefactThrEvery=NA, 
                               verbosity=0)

Arguments

alignedFile

An atomic character string. The full path to the file containing the reads aligned by bowtie with the --concise option.

outputFolder

An atomic character string. The path to the folder where the file output by the script must be stored.

referenceFile

An atomic character string. Either a full path to a reference file (see details for format specification), or the ID of one reference included in the package (see details for available ones).

window_size

A positive integer. The size of the window used by the algorithm (see algorithm details). Default value is 101.

iteration_number

A positive integer. The number of iteration executed by the algorithm (see algorithm details). Default value is 200.

incrArtefactThrEvery

A complex parameter (see details). A numeric value or NA. A strictly positive numeric value activate the option that allow to remove the 'artifacts', defining a threshold to consider piles like 'artifacts' as 'number of reads in the experiment de

verbosity

An integer. The verbose level : 0 = no message, 1 = trace level

Value

A tab separated value text file formated as below:
- Column 1 : Chromosome name
Column 2 : Strand
Column 3 : Position
Column 4 : Score

Details

The script consider the reads that have been aligned in several location by bowtie (multi-reads). At each read, it assign a score determined by the CSEM algorithm (Chung et al. "Discovering Transcription Factor Binding Sites of Genomes with Multi-Read Analysis of ChIP-seq Data" (2011) PLoS Computational Biology). The script output a tab separated value text file formated as below:

Column 1 : Chromosome name

Column 2 : Strand Column 3 : Position Column 4 : Score

Examples

Run this code

# Define input aligned file
my_aligned_file <- system.file("extdata",
                               "embededDataTest_MultiSignal.bow",
                               package="Pasha")

# Define the output folder
my_output_folder <- tempdir()

# Define the genome reference file
genome_reference_file <- system.file("resources",
                                     "mm9.ref",
                                     package="Pasha")

# Launch the script
multiread_CSEMDispatch(my_aligned_file, 
                       my_output_folder, 
                       genome_reference_file,
                       incrArtefactThrEvery=7000000, 
                       verbosity=1)

Run the code above in your browser using DataLab