seqinr (version 1.0-1)

s2n: simple numerical encoding of a DNA sequence.

Description

By default, if no levels arguments is provided, this function will just code your DNA sequence in integer values following the lexical order (a > c > g > t), that is 0 for "a", 1 for "c", 2 for "g", 3 for "t" and NA for ambiguous bases.

Usage

s2n(seq, levels, base4 = TRUE)

Arguments

seq
a vector of chars
levels
allowed char values, by default a, c, g and t
base4
if TRUE the numerical encoding will start at O, if FALSE at 1
...
further arguments to factor

Value

  • a vector of integers

References

� To have an overview of the seqinR's functionnality, please consult this vignette: Charif, D., Lobry, J.R. (2005) SeqinR: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis. Springer Verlag, Biological and Medical Physics/Biomedical Series, in preparation.

See Also

n2s, factor, unclass

Examples

Run this code
#example of default behaviour
urndna <- c("a","c","g","t")
seq <- sample( urndna, 100, replace = TRUE ) ; seq
s2n(seq)
#How to deal with RNA
urnrna <- c("a","c","g","t")
seq <- sample( urnrna, 100, replace = TRUE ) ; seq
s2n(seq)
#what's happen with unknown characters
urnmess <- c(urndna,"n")
seq <- sample( urnmess, 100, replace = TRUE ) ; seq
s2n(seq)
#How to change the encoding for unknown characters
tmp <- s2n(seq) ; tmp[is.na(tmp)] <- -1; tmp

Run the code above in your browser using DataLab