transcripts: Extract genomic features from an object

Description

Generic functions to extract genomic features from an object. This page documents the methods for TxDb objects only.

Usage

transcripts(x, ...)
"transcripts"(x, vals=NULL, columns=c("tx_id", "tx_name"))
exons(x, ...)
"exons"(x, vals=NULL, columns="exon_id")
cds(x, ...)
"cds"(x, vals=NULL, columns="cds_id")
genes(x, ...)
"genes"(x, vals=NULL, columns="gene_id", single.strand.genes.only=TRUE)
#promoters(x, upstream=2000, downstream=200, ...)
"promoters"(x, upstream=2000, downstream=200, ...)
disjointExons(x, ...)
"disjointExons"(x, aggregateGenes=FALSE,  includeTranscripts=TRUE, ...) 
microRNAs(x)
"microRNAs"(x)
tRNAs(x)
"tRNAs"(x)

Arguments

A TxDb object.

...

Arguments to be passed to or from methods.

vals

Either NULL or a named list of vectors to be used to restrict the output. Valid names for this list are: "gene_id", "tx_id", "tx_name", "tx_chrom", "tx_strand", "exon_id", "exon_name", "exon_chrom", "exon_strand", "cds_id", "cds_name", "cds_chrom", "cds_strand" and "exon_rank".

columns

Columns to include in the output. Must be NULL or a character vector as given by the columns method. With the following restrictions:

"TXCHROM" and "TXSTRAND" are not allowed for transcripts.
"EXONCHROM" and "EXONSTRAND" are not allowed for exons.
"CDSCHROM" and "CDSSTRAND" are not allowed for cds.

If the vector is named, those names are used for the corresponding column in the element metadata of the returned object.

single.strand.genes.only

TRUE or FALSE. If TRUE (the default), then genes that have exons located on both strands of the same chromosome or on two different chromosomes are dropped. In that case, the genes are returned in a GRanges object. Otherwise, all genes are returned in a GRangesList object with the columns specified thru the columns argument set as top level metadata columns. (Please keep in mind that the top level metadata columns of a GRangesList object are not displayed by the show method.)

upstream

For promoters : An integer(1) value indicating the number of bases upstream from the transcription start site. For additional details see ?'promoters,GRanges-method'.

downstream

For promoters : An integer(1) value indicating the number of bases downstream from the transcription start site. For additional details see ?'promoters,GRanges-method'.

aggregateGenes

For disjointExons : A logical. When FALSE (default) exon fragments that overlap multiple genes are dropped. When TRUE, all fragments are kept and the gene_id metadata column includes all gene ids that overlap the exon fragment.

includeTranscripts

For disjointExons : A logical. When TRUE (default) a tx_name metadata column is included that lists all transcript names that overlap the exon fragment.

Value

A GRanges object. The only exception being when genes is used with single.strand.genes.only=FALSE, in which case a GRangesList object is returned.

Details

These are the main functions for extracting transcript information from a TxDb object. With the exception of microRNAs, these methods can restrict the output based on categorical information. To restrict the output based on interval information, use the transcriptsByOverlaps, exonsByOverlaps, and cdsByOverlaps functions.

The promoters function computes user-defined promoter regions for the transcripts in a TxDb object. The return object is a GRanges of promoter regions around the transcription start site the span of which is defined by upstream and downstream. For additional details on how the promoter range is computed and the handling of + and - strands see ?'promoters,GRanges-method'.

disjointExons creates a GRanges of non-overlapping exon parts with metadata columns of gene_id and exonic_part. Exon parts that overlap more than 1 gene can be dropped with aggregateGenes=FALSE. When includeTranscripts=TRUE a tx_name metadata column is included that lists all transcript names that overlap the exon fragment. This function replaces prepareAnnotationForDEXSeq in the DEXSeq package.

Examples

Run this code

## transcripts(), exons(), genes():
txdb <- loadDb(system.file("extdata", "hg19_knownGene_sample.sqlite",
                           package="GenomicFeatures"))
vals <- list(tx_chrom = c("chr3", "chr5"), tx_strand = "+")

transcripts(txdb, vals)

exons(txdb, vals=list(exon_id=1), columns=c("EXONID", "TXNAME"))
exons(txdb, vals=list(tx_name="uc009vip.1"), columns=c("EXONID",
      "TXNAME"))

genes(txdb)  # a GRanges object
cols <- c("tx_id", "tx_chrom", "tx_strand",
          "exon_id", "exon_chrom", "exon_strand")
single_strand_genes <- genes(txdb, columns=cols)

## Because we've returned single strand genes only, the "tx_chrom"
## and "exon_chrom" metadata columns are guaranteed to match
## 'seqnames(single_strand_genes)':
stopifnot(identical(as.character(seqnames(single_strand_genes)),
                    as.character(mcols(single_strand_genes)$tx_chrom)))
stopifnot(identical(as.character(seqnames(single_strand_genes)),
                    as.character(mcols(single_strand_genes)$exon_chrom)))
## and also the "tx_strand" and "exon_strand" metadata columns are
## guaranteed to match 'strand(single_strand_genes)':
stopifnot(identical(as.character(strand(single_strand_genes)),
                    as.character(mcols(single_strand_genes)$tx_strand)))
stopifnot(identical(as.character(strand(single_strand_genes)),
                    as.character(mcols(single_strand_genes)$exon_strand)))

all_genes <- genes(txdb, columns=cols, single.strand.genes.only=FALSE)
all_genes  # a GRangesList object
multiple_strand_genes <- all_genes[elementLengths(all_genes) >= 2]
multiple_strand_genes
mcols(multiple_strand_genes)

## microRNAs() :
## Not run: library(TxDb.Hsapiens.UCSC.hg19.knownGene)
# library(mirbase.db)
# microRNAs(TxDb.Hsapiens.UCSC.hg19.knownGene)
# ## End(Not run)

## promoters() :
head(promoters(txdb, 100, 50))

Run the code above in your browser using DataLab