microRNA (version 1.30.0)

get_selfhyb_subseq: Get Self-Hybridizing Subsequences

Description

This function finds the longest self-hybridizing subsequences present in RNA or DNA sequences.

Usage

get_selfhyb_subseq(seq, minlen, type = c("RNA", "DNA")) show_selfhyb_counts(L) show_selfhyb_lengths(L)

Arguments

seq
character vector of RNA or DNA sequences
minlen
an integer specifying the minimum length in bases of the self-hybridizing subsequences. Subsequences with length less than minlen will be ignored.
type
one of "RNA" or "DNA" depending on the type of sequences provided in seq. Note that you cannot mix RNA and DNA sequences.
L
The output of get_selfhyp_subseq.

Value

A list with an element for each sequence in seq. The list will be named using names(seq).Each element is itself a list with an element for each longest self-hybridizing subsequence (there can be more than one). Each such element is yet another list with components:
starts
integer vector giving the character start positions for the self-hybridizing subsequence in the sequence.
rcstarts
integer vector giving the character start positions for the reverse complement of the self-hybridizing subsequence in the sequence.

Details

get_selfhyb_subseq finds the longest self-hybridizing subsequences of the specified minimum length. It does this using suffix trees and the getLongestSubstring function provided by the Rlibstree package.

These are defined to be the longest string that is found in both the input sequence, seq, and in its reverse complement.

Examples

Run this code
if (suppressWarnings(require(Rlibstree, quietly=TRUE))) {
    seqs = c(a="UGAGGUAGUAGGUUGUAUAGUU", b="UGAGGUAGUAGGUUGUGUGGUU",
             c="UGAGGUAGUAGGUUGUAUGGUU")

    ans = get_selfhyb_subseq(seqs, minlen=3, type="RNA")
    length(ans)

    ans[["a"]]

    show_selfhyb_counts(ans)
    show_selfhyb_lengths(ans)
}

Run the code above in your browser using DataLab