SequenceTrack-class: SequenceTrack class and methods

Description

A track class to represent genomic sequences. The two child classes SequenceDNAStringSetTrack and SequenceBSgenomeTrack do most of the work, however in practise they are of no particular relevance to the user.

Usage

SequenceTrack(sequence, chromosome, genome, name="SequenceTrack",
importFunction, stream=FALSE, ...)

Arguments

sequence

A meta argument to handle the different input types, making the construction of a SequenceTrack as flexible as possible. The different input options for sequence are: [object Object],[object Object],[object Object]

chromosome

The currently active chromosome of the track. A valid UCSC chromosome identifier if options(ucscChromosomeNames=TRUE). Please note that in this case only syntactic checking takes place, i.e., the argument value needs to be an integer, numeric character or a character of the form chrx, where x may be any possible string. The user has to make sure that sequences for the respective chromosomes are indeed part of the object. If not provided here, the constructor will set it to the first available sequence. Please note that by definition all objects in the Gviz package can only have a single active chromosome at a time (although internally the information for more than one chromosome may be present), and the user has to call the chromosome<- replacement method in order to change to a different active chromosome.

genome

The genome on which the track's ranges are defined. Usually this is a valid UCSC genome identifier, however this is not being formally checked at this point. For a SequenceBSgenomeTrack object, the genome information is extracted from the input BSgenome package. For a DNAStringSet it has too be provided or the constructor will fall back to the default value of NA.

name

Character scalar of the track's name used in the title panel when plotting.

importFunction

A user-defined function to be used to import the sequence data from a file. This only applies when the sequence argument is a character string with the path to the input data file. The function needs to accept an argument file containing the file path and has to return a proper DNAStringSet object with the sequence information per chromosome. A set of default import functions is already implemented in the package for a number of different file types, and one of these defaults will be picked automatically based on the extension of the input file name. If the extension can not be mapped to any of the existing import function, an error is raised asking for a user-defined import function. Currently the following file types can be imported with the default functions: fa/fasta and 2bit.

Both file types support indexing by genomic coordinates, and it makes sense to only load the part of the file that is needed for plotting. To this end, the Gviz package defines the derived ReferenceSequenceTrack class, which supports streaming data from the file system. The user typically does not have to deal with this distinction but may rely on the constructor function to make the right choice as long as the default import functions are used. However, once a user-defined import function has been provided and if this function adds support for indexed files, you will have to make the constructor aware of this fact by setting the stream argument to TRUE. Please note that in this case the import function needs to accept a second mandatory argument selection which is a GRanges object containing the dimensions of the plotted genomic range. As before, the function has to return an appropriate DNAStringSet object.

stream

A logical flag indicating that the user-provided import function can deal with indexed files and knows how to process the additional selection argument when accessing the data on disk. This causes the constructor to return a ReferenceSequenceTrack object which will grab the necessary data on the fly during each plotting operation.

...

Additional items which will all be interpreted as further display parameters. See settings and the "Display Parameters" section below for details.

Value

The return value of the constructor function is a new object of class SequenceDNAStringSetTrack, SequenceBSgenomeTrack ore ReferenceSequenceTrack, depending on the constructor arguments. Typically the user will not have to be troubled with this distinction and can rely on the constructor to make the right choice.

Objects from the class

Objects can be created using the constructor function SequenceTrack.

Extends

Class "GdObject", directly.

Details

Depending on the available space the class will use different options to plot a sequence. If single letters can be accomodated without overplotting those will be show. Otherwise, colored boxes will be used to indicate letters, and if there is not enough horizontal room to show those, a simple line will indicate presence of a sequence. The min.width and fontsize display parameters directly control this behaviour. Each of the five possible nucleotides (G, A, T, C, and N) will be endoded in a separate color. As default we use the colors suggested in the biovizBase package, but a user is free to set their own color scheme by providing a named character vector with color as display parameter fontcolor, with names equal to the five possible bases.

Examples

Run this code

## An empty object
SequenceTrack()

## Construct from DNAStringSet
library(Biostrings)
letters <- c("A", "C", "T", "G", "N")
set.seed(999)
seqs <- DNAStringSet(c(chr1=paste(sample(letters, 100000, TRUE),
collapse=""), chr2=paste(sample(letters, 200000, TRUE), collapse="")))
sTrack <- SequenceTrack(seqs, genome="hg19")
sTrack

## Construct from BSGenome object
if(require(BSgenome.Hsapiens.UCSC.hg19)){
sTrack <- SequenceTrack(Hsapiens)
sTrack
}


## Set active chromosome
chromosome(sTrack)
chromosome(sTrack) <- "chr2"
head(seqnames(sTrack))



## For some annoying reason the postscript device does not know about
## the sans font
if(!interactive())
{
font <- ps.options()$family
displayPars(sTrack) <- list(fontfamily=font, fontfamily.title=font)
}

## Plotting
## Sequences
plotTracks(sTrack, from=199970, to=200000)
## Boxes
plotTracks(sTrack, from=199800, to=200000)
## Line
plotTracks(sTrack, from=1, to=200000)
## Force boxes
plotTracks(sTrack, from=199970, to=200000, noLetters=TRUE)
## Direction indicator
plotTracks(sTrack, from=199970, to=200000, add53=TRUE)
## Sequence complement
plotTracks(sTrack, from=199970, to=200000, add53=TRUE, complement=TRUE)
## Colors
plotTracks(sTrack, from=199970, to=200000, add53=TRUE, fontcolor=c(A=1,
C=1, G=1, T=1, N=1))

## Track names
names(sTrack)
names(sTrack) <- "foo"

## Accessors
genome(sTrack)
genome(sTrack) <- "mm9"
length(sTrack)

## Sequence extraction
subseq(sTrack, start=100000, width=20)
## beyond the stored sequence range
subseq(sTrack, start=length(sTrack), width=20)

Run the code above in your browser using DataLab