words.pos: Positions of possibly degenerated motifs within sequences

Description

word.pos searches all the occurences of the motif pattern within the sequence text and returns their positions. This function is based on regexp allowing thus for complex motif searches. The main difference with gregexpr is that non disjoint matches are reported here.

Usage

words.pos(pattern, text, ignore.case = FALSE,
                      perl = TRUE, fixed = FALSE, useBytes = TRUE, ...)

Value

a vector of positions for which the motif pattern was found in the sequence text.

Arguments

pattern: character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector.
text: a character vector where matches are sought.
ignore.case: if FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching.
perl: logical. Should perl-compatible regexps be used if available? Has priority over extended.
fixed: logical. If TRUE, pattern is a string to be matched as is. Overrides all conflicting arguments.
useBytes: logical. If TRUE the matching is done byte-by-byte rather than character-by-character.
...: arguments passed to regexpr.

Author

J.R. Lobry

Details

Default parameter values have been tuned for speed when working biological sequences.

References

citation("seqinr")

Examples

Run this code

myseq <- "tatagaga"
words.pos("t", myseq)   # Should be 1 3
words.pos("tag", myseq) # Should be 3
words.pos("ga", myseq)  # Should be 5 7
# How to specify ambiguous base ? Look for YpR motifs by
words.pos("[ct][ag]", myseq) # Should be 1 3
#
# Show the difference with gregexpr:
#
words.pos("toto", "totototo")           # 1 3 5 (three overlapping matches)
unlist(gregexpr("toto",  "totototo")) # 1 5    (two disjoint matches)

Run the code above in your browser using DataLab