Learn R Programming

mosaics (version 2.10.0)

extractReads: Load read-level data and extract reads corresponding to each peak region

Description

Load read-level data and extract reads corresponding to each peak region in the MosaicsPeak class object, which is a peak calling result.

Usage

extractReads( object, ... ) "extractReads"( object, chipFile=NULL, chipFileFormat=NULL, chipPET=FALSE, chipFragLen=200, controlFile=NULL, controlFileFormat=NULL, controlPET=FALSE, controlFragLen=200, keepReads=FALSE, parallel=FALSE, nCore=8, tempDir=NULL, perl="perl" )

Arguments

object
Object of class MosaicsPeak, a peak list object obtained using either functions mosaicsPeak or mosaicsPeakHMM.
chipFile
Name of the aligned read file for ChIP sample.
chipFileFormat
Format of the aligned read file for ChIP sample. Currently, constructBins permits the following aligned read file formats for SET data (chipPET = FALSE): "eland_result" (Eland result), "eland_extended" (Eland extended), "eland_export" (Eland export), "bowtie" (default Bowtie), "sam" (SAM), "bam" (BAM), "bed" (BED), and "csem" (CSEM). For PET data (chipPET = TRUE), the following aligned read file formats are allowed: "eland_result" (Eland result), "sam" (SAM), and "bam" (BAM).
chipPET
Is the file of ChIP sample paired-end tag (PET) data? If chipPET=FALSE, it is assumed that the file is SET data. If chipPET=TRUE, it is assumed that the file is PET data. Default is FALSE (SET data).
chipFragLen
Average fragment length of ChIP sample. Default is 200. This argument is ignored if chipPET=TRUE.
controlFile
Name of the aligned read file for matched control sample.
controlFileFormat
Format of the aligned read file for matched control sample. Currently, constructBins permits the following aligned read file formats for SET data (controlPET = FALSE): "eland_result" (Eland result), "eland_extended" (Eland extended), "eland_export" (Eland export), "bowtie" (default Bowtie), "sam" (SAM), "bam" (BAM), "bed" (BED), and "csem" (CSEM). For PET data (controlPET = TRUE), the following aligned read file formats are allowed: "eland_result" (Eland result), "sam" (SAM), and "bam" (BAM).
controlPET
Is the file of matched control sample paired-end tag (PET) data? If controlPET=FALSE, it is assumed that the file is SET data. If controlPET=TRUE, it is assumed that the file is PET data. Default is FALSE (SET data).
controlFragLen
Average fragment length of matched control sample. Default is 200. This argument is ignored if controlPET=TRUE.
keepReads
Keep read-level data? Default is FALSE (do not keep read-level data).
parallel
Utilize multiple CPUs for parallel computing using "parallel" package? Possible values are TRUE (utilize multiple CPUs) or FALSE (do not utilize multiple CPUs). Default is FALSE (do not utilize multiple CPUs).
nCore
Number of CPUs when parallel computing is utilized.
tempDir
Directory to store temporary files. If tempDir=NULL, extractReads() will use the temporary directory used by R.
perl
Name of the perl executable to be called. Default is "perl".
...
Other parameters to be passed through to generic extractReads.

Value

Construct MosaicsPeak class object.

Details

Read-level data is first loaded from aligned read files, and then the reads corresponding to each peak region are extracted and incorporated into the MosaicsPeak class object. extractReads currently supports the following aligned read file formats for SET data (chipPET = FALSE and controlPET = FALSE): Eland result ("eland_result"), Eland extended ("eland_extended"), Eland export ("eland_export"), default Bowtie ("bowtie"), SAM ("sam"), BAM ("bam"), BED ("bed"), and CSEM ("csem"). For PET data (chipPET = FALSE and controlPET = FALSE), the following aligned read file formats are allowed: "eland_result" (Eland result), "sam" (SAM), and "bam" (BAM).

If input file format is neither BED nor CSEM BED, this method retains only reads mapping uniquely to the reference genome.

References

Kuan, PF, D Chung, G Pan, JA Thomson, R Stewart, and S Keles (2011), "A Statistical Framework for the Analysis of ChIP-Seq Data", Journal of the American Statistical Association, Vol. 106, pp. 891-903.

Chung, D, Zhang Q, and Keles S (2014), "MOSAiCS-HMM: A model-based approach for detecting regions of histone modifications from ChIP-seq data", Datta S and Nettleton D (eds.), Statistical Analysis of Next Generation Sequencing Data, Springer.

See Also

mosaicsPeak, mosaicsPeakHMM, export, findSummit, adjustBoundary, filterPeak, MosaicsPeak.

Examples

Run this code
## Not run: 
# library(mosaicsExample)
# 
# constructBins( infile=system.file( file.path("extdata","wgEncodeBroadHistoneGm12878H3k4me3StdAlnRep1_chr22_sorted.bam"), package="mosaicsExample"), 
#     fileFormat="bam", outfileLoc="~/", 
#     byChr=FALSE, useChrfile=FALSE, chrfile=NULL, excludeChr=NULL, 
#     PET=FALSE, fragLen=200, binSize=200, capping=0 )
# constructBins( infile=system.file( file.path("extdata","wgEncodeBroadHistoneGm12878ControlStdAlnRep1_chr22_sorted.bam"), package="mosaicsExample"), 
#     fileFormat="bam", outfileLoc="~/", 
#     byChr=FALSE, useChrfile=FALSE, chrfile=NULL, excludeChr=NULL, 
#     PET=FALSE, fragLen=200, binSize=200, capping=0 )
# 
# binHM <- readBins( type=c("chip","input"),
#     fileName=c( "~/wgEncodeBroadHistoneGm12878H3k4me3StdAlnRep1_chr22_sorted.bam_fragL200_bin200.txt",
#     "~/wgEncodeBroadHistoneGm12878ControlStdAlnRep1_chr22_sorted.bam_fragL200_bin200.txt" ) )
# fitHM <- mosaicsFit( binHM, analysisType="IO", bgEst="rMOM" )
# hmmHM <- mosaicsFitHMM( fitHM, signalModel = "2S", 
#   init="mosaics", init.FDR = 0.05, parallel=TRUE, nCore=8 )
# peakHM <- mosaicsPeakHMM( hmmHM, FDR = 0.05, decoding="posterior",
#   thres=10, parallel=TRUE, nCore=8 )
# 
# peakHM <- extractReads( peakHM,
#   chipFile=system.file( file.path("extdata","wgEncodeBroadHistoneGm12878H3k4me3StdAlnRep1_chr22_sorted.bam"), package="mosaicsExample"),
#   chipFileFormat="bam", chipPET=FALSE, chipFragLen=200,
#   controlFile=system.file( file.path("extdata","wgEncodeBroadHistoneGm12878ControlStdAlnRep1_chr22_sorted.bam"), package="mosaicsExample"), 
#   controlFileFormat="bam", controlPET=FALSE, controlFragLen=200, parallel=TRUE, nCore=8 )
# peakHM <- findSummit( peakHM, parallel=TRUE, nCore=8 )
# peakHM <- adjustBoundary( peakHM, parallel=TRUE, nCore=8 )
# peakHM <- filterPeak( peakHM, parallel=TRUE, nCore=8 )
# ## End(Not run)

Run the code above in your browser using DataLab