toGRanges: Convert dataset to GRanges

Description

Convert UCSC BED format and its variants, such as GFF, or any user defined dataset such as RangedDate or MACS output file to GRanges

Usage

## S3 method for class 'character':
toGRanges(data, format=c("BED", "GFF",  
                                  "MACS", "MACS2", 
                                  "narrowPeak", "broadPeak",
                                  "others"), 
                   header=FALSE, comment.char="#", colNames=NULL, ...)
    ## S3 method for class 'connection':
toGRanges(data, format=c("BED", "GFF",  
                                  "MACS", "MACS2", 
                                  "narrowPeak", "broadPeak",
                                  "others"), 
                   header=FALSE, comment.char="#", colNames=NULL, ...)
    ## S3 method for class 'data.frame':
toGRanges(data, colNames=NULL, ...)
    ## S3 method for class 'TxDb':
toGRanges(data, feature=c("gene", "transcript", "exon",
                                   "CDS", "fiveUTR", "threeUTR",
                                   "microRNA", "tRNAs", "geneModel"),
                   OrganismDb, ...)
    ## S3 method for class 'EnsDb':
toGRanges(data, 
                   feature=c("gene", "transcript", "exon", "disjointExons"),
                   ...)

Arguments

data

an object of data.frame, TxDb or EnsDb, or the file name of data to be imported. Alternatively, data can be a readable txt-mode connection (See ?read.table).

format

data format. If the data format is set to BED, GFF, narrowPeak or broadPeak, please refer to http://genome.ucsc.edu/FAQ/FAQformat#format1 for column order. "MACS" is for converting the excel output file from MACS1. "MACS2" is for converting the output file from MACS2.

feature

annotation type

header

A logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: header is set to TRUE if and only if the first row contains one fewer field than the number of columns.

comment.char

character: a character vector of length one containing a single character or an empty string. Use "" to turn off the interpretation of comments altogether.

colNames

If the data format is set to "others", colname must be defined. And the colname must contain space, start and end. The column name for the chromosome # should be named as space.

...

parameters passed to read.table

OrganismDb

an object of OrganismDb. It is used for extracting gene symbol for geneModel group for TxDb

Value

An object of GRanges

Examples

Run this code

macs <- system.file("extdata", "MACS_peaks.xls", package="ChIPpeakAnno")
  macsOutput <- toGRanges(macs, format="MACS")
  if(interactive()){
    ## MACS connection
    macs <- readLines(macs)
    macs <- textConnection(macs)
    macsOutput <- toGRanges(macs, format="MACS")
    ## bed
    toGRanges(system.file("extdata", "MACS_output.bed", package="ChIPpeakAnno"),
                format="BED")
    ## narrowPeak
    toGRanges(system.file("extdata", "peaks.narrowPeak", package="ChIPpeakAnno"),
                format="narrowPeak")
    ## broadPeak
    toGRanges(system.file("extdata", "TAF.broadPeak", package="ChIPpeakAnno"),
                format="broadPeak")
    ## MACS2
    toGRanges(system.file("extdata", "MACS2_peaks.xls", package="ChIPpeakAnno"),
                format="MACS2")
    ## GFF
    toGRanges(system.file("extdata", "GFF_peaks.gff", package="ChIPpeakAnno"),
                format="GFF")
    ## EnsDb
    library(EnsDb.Hsapiens.v75)
    toGRanges(EnsDb.Hsapiens.v75, feature="gene")
    ## TxDb
    library(TxDb.Hsapiens.UCSC.hg19.knownGene)
    toGRanges(TxDb.Hsapiens.UCSC.hg19.knownGene, feature="gene")
    ## data.frame
    macs <- system.file("extdata", "MACS_peaks.xls", package="ChIPpeakAnno")
    macs <- read.delim(macs, comment.char="#")
    toGRanges(macs)
  }

Run the code above in your browser using DataLab