Learn R Programming

tidysq (version 1.2.3)

substitute_letters: Substitute letters in a sequence

Description

Replaces all occurrences of a letter with another.

Usage

substitute_letters(x, encoding, ...)

# S3 method for sq substitute_letters(x, encoding, ..., NA_letter = getOption("tidysq_NA_letter"))

Value

An sq object of atp type with updated alphabet.

Arguments

x

[sq]
An object this function is applied to.

encoding

[character || numeric]
A dictionary (named vector), where names are letters to be replaced and elements are their respective replacements.

...

further arguments to be passed from or to other methods.

NA_letter

[character(1)]
A string that is used to interpret and display NA value in the context of sq class. Default value equals to "!".

Details

substitute_letters allows to replace unwanted letters in any sequence with user-defined or IUPAC symbols. Letters can also be replaced with NA values, so that they can be later removed from the sequence by remove_na function.

It doesn't matter whether replaced or replacing letter is single or multiple character. However, the user cannot replace multiple letters with one nor one letter with more than one.

Of course, multiple different letters can be encoded to the same symbol, so c(A = "rep1", H = "rep1", G = "rep1") is allowed, but c(AHG = "rep1") is not (unless there is a letter "AHG" in the alphabet). By doing that any information of separateness of original letters is lost, so it isn't possible to retrieve original sequence after this operation.

All encoding names must be letters contained within the alphabet, otherwise an error will be thrown.

See Also

Functions that manipulate type of sequences: find_invalid_letters(), is.sq(), sq_type(), typify()

Examples

Run this code
# Creating objects to work on:
sq_dna <- sq(c("ATGCAGGA", "GACCGAACGAN", "TGACGAGCTTA", "ACTNNAGCN"),
             alphabet = "dna_ext")
sq_ami <- sq(c("MIOONYTWIL","TIOOLGNIIYROIE", "NYERTGHLI", "MOYXXXIOLN"),
             alphabet = "ami_ext")
sq_atp <- sq(c("mALPVQAmAmA", "mAmAPQ"), alphabet = c("mA", LETTERS))

# Not all letters must have their encoding specified:
substitute_letters(sq_dna, c(T = "t", A = "a", C = "c", G = "g"))
substitute_letters(sq_ami, c(M = "X"))

# Multiple character letters are supported in encodings:
substitute_letters(sq_atp, c(mA = "-"))
substitute_letters(sq_ami, c(I = "ough", O = "eau"))

# Numeric substitutions are allowed too, these are coerced to characters:
substitute_letters(sq_dna, c(N = 9, G = 7))

# It's possible to replace a letter with NA value:
substitute_letters(sq_ami, c(X = NA_character_))

Run the code above in your browser using DataLab