Learn R Programming

EncDNA (version 1.0.2)

Trint.Dist.Feature: Tri-nucleotide distribution-based encoding of nucleotide sequences.

Description

This encoding scheme was first time adopted by Wei et al. (2013) for prediction of splice sites along with MM1 features. In this encoding technique, distribution of trinucleotides are taken into consideration independently for the exon and intron regions of splice site motifs.

Usage

Trint.Dist.Feature(test_seq)

Arguments

test_seq

Sequence dataset to be transformed into numeric feature vectors. There should be atleat two sequences, must be an object of class DNAStringSet.

Value

A numeric matrix of order \(m*64\), where \(m\) is the number of sequences in test_seq.

Details

This encoding scheme is independent of positive and negative datasets. In other words, each sequence can be encoded independently. Further, nucleotide sequence of any length will be transformed into a numeric vector of 64 observations corresponding to 64 combinations of trinucleotides.

References

Wei, D., Zhang, H., Wei, Y. and Jiang, Q. (2013). A novel splice site prediction method using support vector machine. J Comput Inform Syst., 920: 8053-8060.

Examples

Run this code
# NOT RUN {
data(droso)
test <- droso$test
tst <- test
enc <- Trint.Dist.Feature(test_seq=tst)
enc
# }

Run the code above in your browser using DataLab