detailRanges(incoming, txdb, orgdb, dist=5000, promoter=c(3000, 1000), max.intron=1e6, key.field="ENTREZID", name.field="SYMBOL", ignore.strand=TRUE)
GRanges
object containing the ranges to be annotatedTranscriptDb
object for the genome of interestincoming
should be ignoredincoming
is not provided, a GRanges
object will be returned containing ranges for the exons, promoters and gene bodies.
Gene keys (e.g., Entrez IDs) are stored as row names.
Gene symbols, exon numbers and internal groupings (for exons of genes with multiple genomic locations) are also stored as metadata.If incoming
is a GRanges
object, a list will be returned with
overlap
, left
and right
elements. Each element is a
character vector of length equal to the number of ranges in incoming
.
Each non-empty string records the gene symbol, the overlapped exons and the
strand. For left
and right
, the gap between the range and the
annotated feature is also included.
overlap
output vector will be of the form GENE|EXONS|STRAND
.
GENE
is the gene symbol by default, but reverts to
if no symbol is defined for a gene with the Entrez ID XXX
.
The EXONS
indicate the exon or range of exons that are overlapped.
The STRAND
is, obviously, the strand on which the gene is coded.
For annotated regions flanking the region within a distance of dist
, the character string in the left
or right
output vectors will have an additional [DIST]
value.
This represents the gap between the edge of the region and the closest exon for that gene. Exons are numbered in order of increasing start or end position for genes
on the forward or reverse strands, respectively. Promoters are defined as
the region of length promoter
upstream of the gene TSS, itself
defined as the start of the first exon (for genes on the forward strand)
or the end of the last exon (otherwise). All promoters are marked as
exon 0 for simplicity. Exon ranges in EXON
are reported from as a
comma-separated list where stretches of consecutive exons are summarized into a
range. If the region overlaps an intron, it is labelled with I
in
EXON
. No intronic overlaps are reported if there is an exonic overlap. Note that promoter and intronic annotations are only reported for the
overlap
vector to reduce redundancy in the output. For example, it makes
little sense to report that the region is both flanking and overlapping an
intron. Similarly, the value of DIST
is more relevant when it is
reported to the nearest exon rather than to an intron (in which case, the
distance would be zero if the intron overlaps the region). In cases where the
distance is reported to the first exon, it can be used to refine the choice of
promoter
.max.intron
value is necessary to deal with genes that have ambiguous locations on the genome.
If a gene has exons on different chromosomes, its location is uncertain and the gene is partitioned into two sets of exons for separate processing.
However, this is less obvious when the ambiguous locations belong to the same chromosome.
The max.intron
value protects against excessively large genes that may occur from considering those locations as a single transcriptional unit.
Exons are partitioned into two (or more) internal groupings for further processing. The default settings for key.field
and name.field
will work for human and mouse genomes, but may not work for other organisms.
The key.field
should refer to the key type in the OrgDb
object, and also correspond to the GENEID
of the TxDb
object.
For example, in S. cerevisiae, key.field
is set to "ORF"
while name.field
is set to "GENENAME"
.
If multiple entries are supplied in name.field
, the value of GENE
is defined as a semicolon-separated list of each of those entries.incoming
is stranded and ignore.strand=FALSE
, annotated features will only be reported if they lie on the same strand as that region.If incoming
is missing, then the annotation will be provided directly to the user in the form of a GRanges
object.
This may be more useful when further work on the annotation is required.
Exon numbers are provided in the metadata with promoters and gene bodies labelled as 0 and -1, respectively.
Overlaps to introns can be identified by finding those regions that overlap with gene bodies but not with any of the corresponding exons.
require(org.Mm.eg.db)
require(TxDb.Mmusculus.UCSC.mm10.knownGene)
current <- readRDS(system.file("exdata", "exrange.rds", package="csaw"))
output <- detailRanges(current, orgdb=org.Mm.eg.db,
txdb=TxDb.Mmusculus.UCSC.mm10.knownGene)
head(output$overlap)
head(output$right)
head(output$left)
detailRanges(txdb=TxDb.Mmusculus.UCSC.mm10.knownGene, orgdb=org.Mm.eg.db)
## Not run:
# output <- detailRanges(current, txdb=TxDb.Mmusculus.UCSC.mm10.knownGene,
# orgdb=org.Mm.eg.db, name.field=c("ENTREZID"))
# head(output$overlap)
#
# output <- detailRanges(current, txdb=TxDb.Mmusculus.UCSC.mm10.knownGene,
# orgdb=org.Mm.eg.db, name.field=c("SYMBOL", "ENTREZID"))
# head(output$overlap)
# ## End(Not run)
Run the code above in your browser using DataLab