Learn R Programming

EncDNA (version 1.0.2)

Sparse.Feature: Nucleotide sequence encoding with 0 and 1.

Description

In this encoding approach A, T, G and C are encoded as (1,1,1), (1,0,0), (0,1,0) and (0,0,1). This was introduced by Golam Bari et al. (2014). Besides, each nucleotide can also be encoded with four bits i.e., A as (1,0,0,0), T as (0,1,0,0), G as (0,0,1,0) and C as (0,0,0,1) as followed in Meher et al. (2016).

Usage

Sparse.Feature(test_seq)

Arguments

test_seq

Sequence dataset to be encoded into numeric vector containing 0 and 1, must be an object of class DNAStringSet.

Value

A vector of length \(4*n\) for sequence of \(n\) nucleotides long in test_seq.

Details

Each sequence is encoded independently, without the need of positive and negative classes datasets.

References

  1. Bari, A.T.M.G., Reaz, M.R. and Jeong, B.S. (2014). Effective DNA encoding for splice site prediction using SVM. MATCH Commun. Math. Comput. Chem., 71: 241-258.

  2. Meher, P.K., Sahu, T.K., Rao, A.R. and Wahi, S.D. (2016). A computational approach for prediction of donor splice sites with improved accuracy. Journal of Theoretical Biology, 404: 285-294.

Examples

Run this code
# NOT RUN {
data(droso)
test <- droso$test
tst <- test
enc <- Sparse.Feature(test_seq=tst)
enc
# }

Run the code above in your browser using DataLab