Learn R Programming

EncDNA (version 1.0.2)

PN.Fdtf.Feature: Conversion of nucleotide sequences into numeric feature vectors based on the difference of dinucleotide frequency.

Description

Dinucleotide frequency matrix is first computed for both positive and negative classes. Then, frequency matrix of the positive class is substracted from that of negative class. The sequences are then passed through this difference matrix to encode them into numeric feature vectors. Similar to the MN.Fdtf feature, both positive and negative classes are necessary for encoding of nucleotide sequences. This was also conceptualized by Huang et al. (2006). This has also been used by Pashaei et al. (2016) as one of the features for prediction of splice sites along with the other features.

Usage

PN.Fdtf.Feature(positive_class, negative_class, test_seq)

Arguments

positive_class

Sequence dataset of the positive class, must be an object of class DNAStringSet.

negative_class

Sequence dataset of the negative class, must be an object of class DNAStringSet.

test_seq

Sequences to be encoded into numeric vectors, must be an object of class DNAStringSet.

Value

A numeric matrix of order \(m*(n-1)\), where \(m\) is the number of sequences in test_seq and \(n\) is the sequence length.

Details

For getting an object of class DNAStringSet, the sequence dataset must be read in FASTA format through the function readDNAStringSet available in Biostrings package of Bioconductor (https://bioconductor.org/packages/release/bioc/html/Biostrings.html ).

References

  1. Huang, J., Li, T., Chen, K. and Wu, J. (2006). An approach of encoding for prediction of splice sites using SVM. Biochimie, 88(7): 923-929.

  2. Pashaei, E., Yilmaz, A., Ozen, M. and Aydin, N. (2016). Prediction of splice site using AdaBoost with a new sequence encoding approach. In Systems, Man, and Cybernetics (SMC), IEEE International Conference, pp 3853-3858.

See Also

MN.Fdtf.Feature, WAM.Feature, MM1.Feature,

Examples

Run this code
# NOT RUN {
data(droso)
positive <- droso$positive
negative <- droso$negative
test <- droso$test
pos <- positive[1:200]
neg <- negative[1:200]
tst <- test
enc <- PN.Fdtf.Feature(positive_class=pos, negative_class=neg, test_seq=tst)
enc
# }

Run the code above in your browser using DataLab