Learn R Programming

RIPSeeker (version 1.12.0)

mainSeekSingleChrom: Automatic bin size selection, bin count, and HMM parameters optimization on read count vector from a single chromosome (Internal function)

Description

This an internal function used by mainSeek to accomplish three major tasks on a single chromosome: automatically select bin size, compute read counts within the bins, and obtain optimal HMM paramters.

Usage

mainSeekSingleChrom(alignGR, K = 2, binSize = NULL, minReadCount = 10, 
	backupNumBins = 10, minBinSize = 200, maxBinSize = 1200, 
	increment = 5, pathToSavePlotsOfBinSizesVersusCosts, 
	verbose = TRUE, allowSecondAttempt = TRUE, ...)

Arguments

alignGR
GRanges containing the alignments on a single chromosome .
K
Number of hidden states (Default: 2). By default, state 1 specifies the background and state 2 the RIP regions. The two states are recognized by the means for the two distributions (See nbh_em).
binSize
Size to use for binning the read counts across each chromosome. If NULL, optimal bin size within a range (default: minBinSize=200, maxBinSize=1200) will be automatically selected (See selectBinSize).
minReadCount
Minimum aligned read counts needed for HMM to converge (Default: 10). Note that HMM may not converge some times when majority of the read counts are zero even if some read count > 10. When that happens, a back-up function addDummyProb comes in to create a placeholder for the corresponding chromosome in GRangeList to maintain the data structure to preserve all information (successfully) obtained from other chromosomes.
backupNumBins
If read count is less than minReadCount, then use backupNumBins (Default: 10) to bin the chromosome.
minBinSize
Minimum bin size to start with the bin selection (See selectBinSize). Default to 200, common minimum band size selected in RIP or RNA-seq library construction.
maxBinSize
Maximum bin size to stop with the bin selection (See selectBinSize). Default: 1200.
increment
Step-wise increment in bin size selection (See selectBinSize). Default: 5.
pathToSavePlotsOfBinSizesVersusCosts
Directory used to save the diagnostic plots for bin size selection.
verbose
Binary indicator for disable (FALSE) or enable (TRUE) HMM training message from function nbh to output to the console.
allowSecondAttempt
In case HMM fails to converge due to malformed paramters in EM iteraction, re-iterating the HMM process each time with a different suboptimal bin size in attempt to succeed in some trial. If all yeild nothing, fall back up to addDummyProb to return the place holder for the chromosome.
...
Argumnets passed to nbh.

Value

  • nbhGRGRanges object containing the optimized HMM parameters (and the Viterbi hidden state sequence) accompanied with the read count vector following the (automatic) binning scheme.

See Also

ripSeek, mainSeek, nbh_em

Examples

Run this code
# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker") 

bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)

bamFiles <- grep("PRC2", bamFiles, value=TRUE)

# Parameters setting
binSize <- 1e5							  # use a large fixed bin size for demo only
minBinSize <- NULL						# min bin size in automatic bin size selection
maxBinSize <- NULL						# max bin size in automatic bin size selection
multicore <- FALSE						# use multicore
strandType <- "-"							# set strand type to minus strand

# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker") 

bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)

bamFiles <- grep("PRC2", bamFiles, value=TRUE)

alignGal <- getAlignGal(bamFiles[1], reverseComplement=TRUE, genomeBuild="mm9")

alignGR <- as(alignGal, "GRanges")

alignGRList <- GRangesList(as.list(split(alignGR, seqnames(alignGR))))

################ run main function for HMM inference on a single chromosome ################
nbhGR <- mainSeekSingleChrom(alignGR=alignGRList$chrX, K = 2, binSize=binSize, 
			
			minBinSize = minBinSize, maxBinSize = maxBinSize)
			
nbhGR

Run the code above in your browser using DataLab