Density.Feature: Nucleotide sequence encoding with the distribution of trinucleotides.
Description
Each nucleotide sequence is encoded into a numeric vector of same length based on the distribution of nucleotides over the sequence. Here, two classes of dataset are not required for encoding, and each sequence is independently encoded instead. This encoding seheme was introduced by Wei et al. (2013) for prediction of donor and acceptor human splice sites along with the MM1.Feature.
Usage
Density.Feature(test_seq)
Arguments
test_seq
Sequence dataset to be encoded, must be an object of class DNAStringSet.
Value
A numeric matrix of order \(m*n\), where \(m\) is the number of sequences in test_seq and \(n\) is the length of sequence.
Details
The class DNAStringSet can be obtained by reading FASTA sequences using the function readDNAStringSet avialble in Biostrings package of Bioconductor.
References
Bari, A.T.M.G., Reaz, M.R. and Jeong, B.S. (2014). Effective DNA encoding for splice site prediction using SVM. MATCH Commun. Math. Comput. Chem., 71: 241-258.