Learn R Programming

mosaics (version 2.10.0)

mosaicsFitHMM: Fit MOSAiCS-HMM model

Description

Fit MOSAiCS-HMM model.

Usage

mosaicsFitHMM( object, ... ) "mosaicsFitHMM"( object, signalModel="2S", binsize=NA, init="mosaics", init.FDR=0.05, init.maxgap=200, init.minsize=50, init.thres=10, init.piMat=as.matrix(NA), max.iter=100, eps=1e-20, parallel=FALSE, nCore=8 )

Arguments

object
Object of class MosaicsFit, a fitted MOSAiCS model obtained using function mosaicsFit.
signalModel
Signal model. Possible values are "1S" (one-signal-component model) and "2S" (two-signal-component model). Default is "2S".
binsize
Size of each bin. Value should be positive integer. If binsize=NA, mosaicsFitHMM function calcuates the value from data. Default is NA.
init
Approach to initialize MOSAiCS-HMM. Possible values are "mosaics" (use MOSAiCS peak calling results for initialization) or "specify" (explicitly specify transition matrix). Default is "mosaics".
init.FDR
Parameter for the MOSAiCS-HMM initialization. False discovery rate. Default is 0.05. Related only if init="mosaics".
init.maxgap
Parameter for the MOSAiCS-HMM initialization. Initial nearby peaks are merged if the distance (in bp) between them is less than init.maxgap. Default is 200. Related only if init="mosaics".
init.minsize
Parameter for the MOSAiCS-HMM initialization. An initial peak is removed if its width is narrower than init.minsize. Default is 50. Related only if init="mosaics".
init.thres
Parameter for the MOSAiCS-HMM initialization. A bin within initial peak is removed if its ChIP tag counts are less than init.thres. Default is 10. Related only if init="mosaics".
init.piMat
Initial value for transition matrix. The first rows/columns correspond to the non-binding state while the second rows/columns correspond to the binding state. Related only if init="specify". If init="specify" but init.piMat is not specified, mosaicsFitHMM() uses its default for the MOSAiCS-HMM initialization.
max.iter
Number of iterations for fitting MOSAiCS-HMM. Default is 100.
eps
Criterion to stop iterations for fitting MOSAiCS-HMM. Default is 1e-20.
parallel
Utilize multiple CPUs for parallel computing using "parallel" package? Possible values are TRUE (utilize multiple CPUs) or FALSE (do not utilize multiple CPUs). Default is FALSE (do not utilize multiple CPUs).
nCore
Number of CPUs when parallel computing is utilized.
...
Other parameters to be passed through to generic mosaicsFitHMM.

Value

Construct MosaicsHMM class object.

Details

mosaicsFitHMM and mosaicsPeakHMM are developed to identify broad peaks such as histone modifications, using Hidden Markov Model (HMM) approach, as proposed in Chung et al. (2014). If you are interested in identifying narrow peaks such as transcription factor binding sites, please use mosaicsPeak instead of mosaicsFitHMM and mosaicsPeakHMM.

When peaks are called, proper signal model needs to be specified. The optimal choice for the number of signal components depends on the characteristics of ChIP-seq data. In order to support users in the choice of optimal signal model, Bayesian Information Criterion (BIC) values and Goodness of Fit (GOF) plot are provided for the fitted MOSAiCS model. BIC values and GOF plot can be obtained by applying show and plot methods, respectively, to the MosaicsFit class object, which is a fitted MOSAiCS model.

init.FDR, init.maxgap, init.minsize, and init.thres are the parameters for MOSAiCS-HMM initialization when MOSAiCS peak calling results are used for initialization (init="mosaics"). If user specifies transition matrix (init="specify"), only init.piMat is used for initialization. If you use a bin size shorter than the average fragment length of the experiment, we recommend to set init.maxgap to the average fragment length and init.minsize to the bin size. If you set the bin size to the average fragment length or if bin size is larger than the average fragment length, set init.maxgap to the average fragment length and init.minsize to a value smaller than the average fragment length. See the vignette for further details.

Parallel computing can be utilized for faster computing if parallel=TRUE and parallel package is loaded. nCore determines number of CPUs used for parallel computing.

References

Kuan, PF, D Chung, G Pan, JA Thomson, R Stewart, and S Keles (2011), "A Statistical Framework for the Analysis of ChIP-Seq Data", Journal of the American Statistical Association, Vol. 106, pp. 891-903.

Chung, D, Zhang Q, and Keles S (2014), "MOSAiCS-HMM: A model-based approach for detecting regions of histone modifications from ChIP-seq data", Datta S and Nettleton D (eds.), Statistical Analysis of Next Generation Sequencing Data, Springer.

See Also

mosaicsFit, mosaicsPeakHMM, MosaicsFit, MosaicsHMM.

Examples

Run this code
## Not run: 
# library(mosaicsExample)
# 
# constructBins( infile=system.file( file.path("extdata","wgEncodeBroadHistoneGm12878H3k4me3StdAlnRep1_chr22_sorted.bam"), package="mosaicsExample"), 
#     fileFormat="bam", outfileLoc="~/", 
#     byChr=FALSE, useChrfile=FALSE, chrfile=NULL, excludeChr=NULL, 
#     PET=FALSE, fragLen=200, binSize=200, capping=0 )
# constructBins( infile=system.file( file.path("extdata","wgEncodeBroadHistoneGm12878ControlStdAlnRep1_chr22_sorted.bam"), package="mosaicsExample"), 
#     fileFormat="bam", outfileLoc="~/", 
#     byChr=FALSE, useChrfile=FALSE, chrfile=NULL, excludeChr=NULL, 
#     PET=FALSE, fragLen=200, binSize=200, capping=0 )
# 
# binHM <- readBins( type=c("chip","input"),
#     fileName=c( "~/wgEncodeBroadHistoneGm12878H3k4me3StdAlnRep1_chr22_sorted.bam_fragL200_bin200.txt",
#     "~/wgEncodeBroadHistoneGm12878ControlStdAlnRep1_chr22_sorted.bam_fragL200_bin200.txt" ) )
# binHM
# plot(binHM)
# plot( binHM, plotType="input" )
# 
# fitHM <- mosaicsFit( binHM, analysisType="IO", bgEst="rMOM" )
# fitHM
# plot(fitHM)
# 
# hmmHM <- mosaicsFitHMM( fitHM, signalModel = "2S", 
#   init="mosaics", init.FDR = 0.05, parallel=TRUE, nCore=8 )
# hmmHM
# plot(hmmHM)
# 
# peakHM <- mosaicsPeakHMM( hmmHM, FDR = 0.05, decoding="posterior",
#   thres=10, parallel=TRUE, nCore=8 )
# 
# peakHM <- extractReads( peakHM,
#   chipFile=system.file( file.path("extdata","wgEncodeBroadHistoneGm12878H3k4me3StdAlnRep1_chr22_sorted.bam"), package="mosaicsExample"),
#   chipFileFormat="bam", chipPET=FALSE, chipFragLen=200,
#   controlFile=system.file( file.path("extdata","wgEncodeBroadHistoneGm12878ControlStdAlnRep1_chr22_sorted.bam"), package="mosaicsExample"), 
#   controlFileFormat="bam", controlPET=FALSE, controlFragLen=200, parallel=TRUE, nCore=8 )
# peakHM
# 
# peakHM <- findSummit( peakHM, parallel=TRUE, nCore=8 )
# head(print(peakHM))
# plot( peakHM, filename="~/peakplot_HM.pdf" )
# 
# peakHM <- adjustBoundary( peakHM, parallel=TRUE, nCore=8 )
# peakHM
# head(print(peakHM))
# 
# peakHM <- filterPeak( peakHM, parallel=TRUE, nCore=8 )
# peakHM
# head(print(peakHM))
# 
# export( peakHM, type = "txt", filename = "./peakHM.txt" )
# export( peakHM, type = "bed", filename = "./peakHM.bed" )
# export( peakHM, type = "gff", filename = "./peakHM.gff" )
# export( peakHM, type = "narrowPeak", filename = "./peakHM.narrowPeak" )
# export( peakHM, type = "broadPeak", filename = "./peakHM.broadPeak" )
# ## End(Not run)

Run the code above in your browser using DataLab