BioVector: DNAVector, RNAVector, AAVector Objects and BioVector Class

Description

Create an object containing a set of DNA-, RNA- or amino acid sequences

Usage

## Constructors:
RNAVector(x = character())
AAVector(x = character())
## Accessor-like methods: see below
## S3 method for class 'BioVector,index,missing,ANY':
[(x, i)
## S3 method for class 'BioVector':
as.character(x, use.names = TRUE)

Arguments

character vector containing a set of sequences as uppercase characters or in mixed uppercase/lowercase form.

numeric vector with indicies or character with element names

use.names

when set to TRUE the names are preserved

Value

constructors DNAVector, RNAVector, AAVector return a sequence set of identical class name

Details

The class DNAVector is used for storing DNA sequences, RNAVector for RNA sequences and AAVector for amino acid sequences. The class BioVector is derived from the R base type character representing a vector of character strings. It is an abstract class which can not be instantiated. BioVector is the parent class for DNAVector, RNAVector and AAVector. For the three derived classes identically named functions exist which are constructors. It should be noted that the constructors only wrap the sequence data into a class without copying or recoding the data. The functions provided for DNAVector, RNAVector and AAVector classes are only a very small subset compared to those of XStringSet but are designed along their counterparts from the Biostrings package. Assignment of metadata and element metadata via mcols is supported for the DNAVector, RNAVector and AAVector objects similar to objects of XStringSet derived classes (for details on metadata assignment see annotationMetadata and positionMetadata). In contrast to XStringSet the BioVector derived classes also support the storage of lowercase characters. This can be relevant for repeat regions which are often coded in lowercase characters. During the creation of XStringSet derived classes the lowercase characters are converted to uppercase automatically and the information about repeat regions is lost. For BioVector derived classes the user can specify during creation of a sequence kernel object whether lowercase characters should be included as uppercase characters or whether repeat regions should be ignored during sequence analysis. In this way it is possible to perform both types of analysis on the same set of sequences through defining one kernel object which accepts lowercase characters and another one which ignores them.

References

http://www.bioinf.jku.at/software/kebabs J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package for kernel-based analysis of biological sequences. Bioinformatics, 31(15):2574-2576, 2015. DOI: http://dx.doi.org/10.1093/bioinformatics/btv176{10.1093/bioinformatics/btv176}.

Examples

Run this code

## in general DNAStringSet should be prefered as described above
## create DNAStringSet object for a set of sequences
x <- DNAStringSet(c("AACCGCGATTATCGatatatatatatatatTGGAAGCTAGGACTA",
                    "GACTTACCCgagagagagagagaCATGAGAGGGAAGCTAGTA"))
## assign names to the sequences
names(x) <- c("Sample1", "Sample2")

## to show the different handling of lowercase characters
## create DNAVector object for the same set of sequences and assign names
xv <- DNAVector(c("AACCGCGATTATCGatatatatatatatatTGGAAGCTAGGACTA",
                  "GACTTACCCgagagagagagagaCATGAGAGGGAAGCTAGTA"))
names(xv) <- c("Sample1", "Sample2")

## show DNAStringSet object - lowercase characters were translated
x
## in the DNAVector object lowercase characters are unmodified
## their handling can be defined at the level of the sequence kernel
xv

## show number of the sequences in the set and their number of characters
length(xv)
width(xv)
nchar(xv)

Run the code above in your browser using DataLab