ANF_DNA: Accumulated Nucleotide Frequency (ANF_DNA)
Description
This function replaces nucleotides with a four-length vector.
The first three elements represent the nucleotides and
the forth holds the frequency of the nucleotide from the beginning of the sequence until the position of the nucleotide in the sequence.
'A' will be replaced with c(1, 1, 1, freq), 'C' with c(0, 1, 0, freq),'G' with c(1, 0, 0, freq), and 'T' with c(0, 0, 1, freq).
is a FASTA file containing nucleotide sequences. The sequences start
with '>'. Also, seqs could be a string vector. Each element of the vector is a nucleotide sequence.
outFormat
(output format) can take two values: 'mat'(matrix) and 'txt'. The default value is 'mat'.
outputFileDist
shows the path and name of the 'txt' output file.
label
is an optional parameter. It is a vector whose length is equivalent to the number of sequences. It shows the class of
each entry (i.e., sequence).
Value
The output depends on the outFormat parameter which can be either 'mat' or 'txt'. If outFormat is 'mat', the function returns a feature
matrix for sequences with the same length such that the number of columns is (sequence length)*(4)
and the number of rows is equal to the number of sequences.
If the outFormat is 'txt', the output is written to a tab-delimited file.
References
Chen, W., Tran, H., Liang, Z. et al. Identification and analysis of the N6-methyladenosine in the Saccharomyces cerevisiae transcriptome. Sci Rep 5, 13859 (2015).
# NOT RUN {LNCSeqsADR<-system.file("extdata/",package="ftrCOOL")
LNC50Nuc<-as.vector(read.csv(paste0(LNCSeqsADR,"/LNC50Nuc.csv"))[,2])
mat<-ANF_DNA(seqs = LNC50Nuc,outFormat="mat")
# }