words.pos: Positions of possibly degenerated motifs within sequences

Description

word.pos searches all the occurences of the motif pattern within the sequence text and returns their positions. This function is based on regexp allowing thus for complex motif searches.

Usage

words.pos(pattern, text, extended = TRUE, perl = FALSE)

Arguments

pattern

character string containing a regular expression to be matched in the given character vector.

text

a character vector where matches are sought.

extended

if `TRUE', extended regular expression matching is used, and if `FALSE' basic regular expressions are used.

perl

logical. Should perl-compatible regexps be used if available? Has priority over `extended'

Value

a vector of positions for which the motif pattern was found in the sequence text.

Details

The regular expressions used are those specified by POSIX 1003.2, either extended or basic, depending on the value of the `extended' argument, unless `perl = TRUE' when they are those of PCRE, ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/. `perl=TRUE' will only be available if R was compiled against PCRE: this is detected at configure time. All Unix and Windows system should have it.

References

� To have an overview of the seqinR's functionnality, please consult this vignette: Charif, D., Lobry, J.R. (2005) SeqinR: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis. Springer Verlag, Biological and Medical Physics/Biomedical Series, in preparation.

Examples

Run this code

myseq <- "tatagaga"
words.pos("t", myseq)   # Should be 1 3
words.pos("tag", myseq) # Should be 3
words.pos("ga", myseq)  # Should be 5 7
# How to specify ambiguous base ? Look for YpR motifs by
words.pos("[ct][ag]", myseq) # Should be 1 3

Run the code above in your browser using DataLab