Sequences-class: Class "Sequences"

Description

This is a small extension of a matrix representation of the sequences. The sequences are represented as integers, where each integer corresponds to a character type from the alphabet. For example, if the sequence is ADC, and the alphabet is ['A', 'B', 'C', 'D'], the sequence will be [1,4,3]. The matrix itself has each sequence as a row, and the alphabet slot contains the key that shows how the integers correspond to the characters in the sequence.

Arguments

Objects from the Class

Objects can be created by calls of the form

new("Sequences", data,
  nrow, ncol, byrow, dimnames, ...)

Extends

Class "matrix", from data part. Class "array", by class "matrix", distance 2. Class "structure", by class "matrix", distance 3. Class "vector", by class "matrix", distance 4, with explicit coerce.

Warning

The gap character is always assumed to be the last character in the sequence slot. Do not change this convention, since the distance method relies on this. Not all data.frame manipulations methods have been overridden. Thus, you may get a data.frame back instead of a Sequences object. Only rbind and array access has been overridden.

Examples

Run this code

##load example data and plot it
data(TULASequences)
plot(TULASequences)

## Access all sequences which have a 4 in position 1
print(TULASequences[TULASequences[,1] == 4,])

## Access all sequences which have an tyrosine residue in position 1 and
## cluster

TULASequences.subset <- TULASequences[TULASequences[,1] == which(TULASequences@alphabet == 'Y'),]
plot(TULASequences.subset)

##Calculate distance matrix on this subset and use agglomerative
##  clustering to plotit
TULA.dmatrix <- dist(TULASequences.subset)
TULA.hclusters <- hclust(TULA.dmatrix)
plot(TULA.hclusters)