Learn R Programming

RIPSeeker (version 1.12.0)

ripSeek: HMM-based de novo RIP predictions using alignment data

Description

This function is the main interface to most essential functions of RIPSeeker package.

Usage

ripSeek(bamPath, cNAME, binSize = NULL, strandType = NULL, 
	paired=FALSE, biomaRt_dataset, goAnno, exportFormat = "gff3", 
	annotateFormat = "txt", annotateType = "TSS", outDir, 
	padjMethod = "BH", logOddCutoff = 0, pvalCutoff = 1, 
	pvalAdjCutoff = 1, eFDRCutoff = 1, ...)

Arguments

bamPath
Either a path to all of the bam files or a list of paths to individual BAM files. BED and SAM files are also accepted.
cNAME
An identifer pattern found in the control alignment files. Once specified, these files will be used as control and the remaining files as RIP for discriminative analysis (see seekRIP).
binSize
Size to use for binning the read counts across each chromosome. If NULL, optimal bin size within a range (default: minBinSize=200, maxBinSize=1200) will be automatically selected (see selectBinSize).
strandType
Type of strand can be +, -, or * as in GAlignments, GAlignmentPairs, or GRanges (see GenomicRanges).
paired
Binary to indicate whether the library is paired-end (TRUE) or single-end (FALSE by default) (see getAlignGal).
biomaRt_dataset
The dataset name used in biomaRt for retrieving genomic information for a given species name (see annotateRIP).
goAnno
GO dataset name used for GO enrichment analysis (See annotateRIP).
exportFormat
Format to export the RIP predictions. The commonly used ones are GFF and BED, which can be directly imported as a track to a genomic viewer such as Integrative Genomic Viewer, SAVANT or USCSC browser.
annotateFormat
Format to export the annotated RIP predictions. The default "txt" is a tab-delimited format, recommanded for viewing in Excel.
annotateType
Type of genomic information in association with the RIP predictions that can be retrieved from Ensembl database (Default: TSS; See annotateRIP).
outDir
Output directory to save the results. The output data include ...
padjMethod
Method to adjust multiple testing (Benjamini-Hocherge method by default).
logOddCutoff
Threshold for the log odd ratio of posterior for the RIP over the background states (See seekRIP). Only peaks with logOdd score greater than the logOddCutoff will be reported. Default: 1.
pvalCutoff
Threshold for the p-value for the logOdd score. Only peaks with p-value less than the pvalCutoff will be reported. Default: 1 (i.e. no cutoff).
pvalAdjCutoff
Threshold for the adjusted p-value for the logOdd score. Only peaks with adjusted p-value less than the pvalAdjCutoff will be reported. Default: 1 (i.e. no cutoff).
eFDRCutoff
Threshold for the empirical false discovery rate (eFDR). Only peaks with eFDR less than the eFDRCutoff will be reported. Default: 1 (i.e. no cutoff).
...
Arguments passed to mainSeek.

Value

  • A list is returned with the following items:
  • mainSeekOutputRIPA (inner) list comprising three items:

    nbhGRList: GRangesList of the HMM trained parameters for each chromosome on RIP. alignGal, alignGalFiltered: GAlignments objects of the RIP alignment outputs from combineAlignGals and disambiguateMultihits, respectively. The former may contain multiple alignments due to the same reads whereas the latter contains a one-to-one mapping from read to alignment after disambiguating the multihits.

  • mainSeekOutputCTLSame as mainSeekOutputRIP but for the control library (if available).
  • RIPGRListThe results as GRangesList generated from the RIP peak detection. Each list item represents the RIP peaks on a chromosome accompanied with statistical scores including (read) count, logOddScore, pval, pvalAdj, eFDR for the RIP and control (if available). Please refer to seekRIP for more details.
  • annotatedRIPGRIf annotatedRIPGR is TRUE, the additional genomic information will be retreived according to the genomic coordinates of the peaks in RIPGRList. The results are saved in this separate GRanges object as the final results that user will find the most useful.

Details

This is the main front-end function of RIPSeeker and in many cases the only function that users need to get RIP predictions and all relevant information.

References

Zhao, J., Ohsumi, T. K., Kung, J. T., Ogawa, Y., Grau, D. J., Sarma, K., Song, J. J., et al. (2010). Genome-wide Identification of Polycomb-Associated RNAs by RIP-seq. Molecular Cell, 40(6), 939D953. doi:10.1016/j.molcel.2010.12.011

The RIPSeeker manuscript has been submitted to NAR for review.

See Also

rulebaseRIPSeek

Examples

Run this code
if(interactive()) { # need internet connection

# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker") 

bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)

bamFiles <- grep("PRC2", bamFiles, value=TRUE)

cNAME <- "SRR039214" 						# specify control name

# output file directory
outDir <- paste(getwd(), "ripSeek_example", sep="/")

# Parameters setting
binSize <- NULL							# automatically determine bin size
minBinSize <- 10000						# min bin size in automatic bin size selection
maxBinSize <- 12000						# max bin size in automatic bin size selection
multicore <- TRUE						# use multicore
strandType <- "-"						# set strand type to minus strand

biomart <- "ENSEMBL_MART_ENSEMBL"		# use archive to get ensembl 65
dataset <- "mmusculus_gene_ensembl"		# mouse dataset id name	
host <- "dec2011.archive.ensembl.org" 	# use ensembl 65 for annotation

goAnno <- "org.Mm.eg.db"


################ run main function ripSeek to predict RIP ################
seekOut <- ripSeek(bamPath=bamFiles, cNAME=cNAME, 
		binSize=binSize, minBinSize = minBinSize, 
		maxBinSize = maxBinSize, strandType=strandType, 
		outDir=outDir, silentMain=FALSE,
		verbose=TRUE, reverseComplement=TRUE, genomeBuild="mm9",
		biomart=biomart, host=host,
		biomaRt_dataset = dataset,
		goAnno = goAnno,
		uniqueHit = TRUE, assignMultihits = TRUE, 
		rerunWithDisambiguatedMultihits = TRUE, multicore=multicore)


################ visualization ################

viewRIP(seekOut$RIPGRList$chrX, seekOut$mainSeekOutputRIP$alignGalFiltered, 
	seekOut$mainSeekOutputCTL$alignGalFiltered, scoreType="eFDR")

}

Run the code above in your browser using DataLab