WAM.Feature: Nucleic acid sequence encoding based on weighted array model.

Description

Unlike weighted matrix method (WMM), first order nucleotide dependencies are accounted in weighted array model (WAM). The WAM was introduced by Zhang and Marr (1993) for locating splicing signal on nuclotide sequences. The WAM was employed by Meher et al. (2016) for encoding of splice site motifs.

Usage

WAM.Feature(positive_class, negative_class, test_seq)

Arguments

positive_class

Sequence dataset of the positive class, must be an object of class DNAStringSet.

negative_class

Sequence dataset of the negative class, must be an object of class DNAStringSet.

test_seq

Sequences to be encoded into numeric vectors, must be an object of class DNAStringSet.

Value

A numeric matrix of order \(m*2\), where \(m\) is the number of sequences in test_seq.

Details

In this encoding approach, a vector of two observations will be obtained for each sequence, corresponds to the situation when only positive class and both positive & neagtive datasets are used for encoding. This encoding scheme is also invariant to the length of the sequence.

References

Zhang, M. and Marr, T. (1993). A weight array method for splicing signal analysis. Comput Appl Biosci., 9(5): 499-509.
Meher, P.K., Sahu, T.K., Rao, A.R. and Wahi, S.D. (2016). Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features. Algorithms for Molecular Biology, 11(1): 16.

Examples

Run this code

# NOT RUN {
data(droso)
positive <- droso$positive
negative <- droso$negative
test <- droso$test
pos <- positive[1:200]
neg <- negative[1:200]
tst <- test
enc <- WAM.Feature(positive_class=pos, negative_class=neg, test_seq=tst)
enc
# }

Run the code above in your browser using DataLab