Biostrings (version 2.40.2)

maskMotif: Masking by content (or by position)

Description

Functions for masking a sequence by content (or by position).

Usage

maskMotif(x, motif, min.block.width=1, ...) mask(x, start=NA, end=NA, pattern)

Arguments

x
The sequence to mask.
motif
The motif to mask in the sequence.
min.block.width
The minimum width of the blocks to mask.
...
Additional arguments for matchPattern.
start
An integer vector containing the starting positions of the regions to mask.
end
An integer vector containing the ending positions of the regions to mask.
pattern
The motif to mask in the sequence.

Value

A MaskedXString object for maskMotif and an XStringViews object for mask.

See Also

read.Mask, matchPattern, XString-class, MaskedXString-class, XStringViews-class, MaskCollection-class

Examples

Run this code
  ## ---------------------------------------------------------------------
  ## EXAMPLE 1
  ## ---------------------------------------------------------------------

  maskMotif(BString("AbcbbcbEEE"), "bcb")
  maskMotif(BString("AbcbcbEEE"), "bcb")

  ## maskMotif() can be used in an incremental way to mask more than 1
  ## motif. Note that maskMotif() does not try to mask again what's
  ## already masked (i.e. the new mask will never overlaps with the
  ## previous masks) so the order in which the motifs are masked actually
  ## matters as it will affect the total set of masked positions.
  x0 <- BString("AbcbEEEEEbcbbEEEcbbcbc")
  x1 <- maskMotif(x0, "E")
  x1
  x2 <- maskMotif(x1, "bcb")
  x2
  x3 <- maskMotif(x2, "b")
  x3
  ## Note that inverting the order in which "b" and "bcb" are masked would
  ## lead to a different final set of masked positions.
  ## Also note that the order doesn't matter if the motifs to mask don't
  ## overlap (we assume that the motifs are unique) i.e. if the prefix of
  ## each motif is not the suffix of any other motif. This is of course
  ## the case when all the motifs have only 1 letter.

  ## ---------------------------------------------------------------------
  ## EXAMPLE 2
  ## ---------------------------------------------------------------------

  x <- DNAString("ACACAACTAGATAGNACTNNGAGAGACGC")

  ## Mask the N-blocks
  x1 <- maskMotif(x, "N")
  x1
  as(x1, "Views")
  gaps(x1)
  as(gaps(x1), "Views")

  ## Mask the AC-blocks 
  x2 <- maskMotif(x1, "AC")
  x2
  gaps(x2)

  ## Mask the GA-blocks
  x3 <- maskMotif(x2, "GA", min.block.width=5)
  x3  # masks 2 and 3 overlap
  gaps(x3)

  ## ---------------------------------------------------------------------
  ## EXAMPLE 3
  ## ---------------------------------------------------------------------

  library(BSgenome.Dmelanogaster.UCSC.dm3)
  chrU <- Dmelanogaster$chrU
  chrU
  alphabetFrequency(chrU)
  chrU <- maskMotif(chrU, "N")
  chrU
  alphabetFrequency(chrU)
  as(chrU, "Views")
  as(gaps(chrU), "Views")

  mask2 <- Mask(mask.width=length(chrU),
                start=c(50000, 350000, 543900), width=25000)
  names(mask2) <- "some ugly regions"
  masks(chrU) <- append(masks(chrU), mask2)
  chrU
  as(chrU, "Views")
  as(gaps(chrU), "Views")

  ## ---------------------------------------------------------------------
  ## EXAMPLE 4
  ## ---------------------------------------------------------------------
  ## Note that unlike maskMotif(), mask() returns an XStringViews object!

  ## masking "by position"
  mask("AxyxyxBC", 2, 6)

  ## masking "by content"
  mask("AxyxyxBC", "xyx")
  noN_chrU <- mask(chrU, "N")
  noN_chrU
  alphabetFrequency(noN_chrU, collapse=TRUE)

Run the code above in your browser using DataCamp Workspace