Learn R Programming

blastar (version 0.1.1)

align_sequences: Align DNA Sequences (Pairwise or Multiple)

Description

This function takes a tibble with a "sequence" column (and optional "accession" names) and performs either a pairwise alignment between two sequences or a multiple sequence alignment (MSA) across all.

Usage

align_sequences(
  df,
  method = c("pairwise", "msa"),
  pairwise_type = "global",
  msa_method = "ClustalOmega",
  seq_indices = c(1, 2)
)

Value

If method="pairwise", a list with:

  • alignment: a PairwiseAlignmentsSingleSubject object

  • pid: percent identity (numeric) If method="msa", an object of class MsaDNAMultipleAlignment or similar.

Arguments

df

A tibble or data.frame containing at least:

  • sequence: character vector of DNA sequences

  • accession (optional): names for each sequence; if present, they will be used as identifiers in the alignment object.

method

One of:

  • "pairwise": perform a pairwise alignment between two sequences

  • "msa": perform a multiple sequence alignment on all sequences

pairwise_type

For pairwise only, alignment type: "global" (Needleman–Wunsch), "local" (Smith–Waterman), or "overlap".

msa_method

For MSA only, method name: "ClustalOmega", "ClustalW", or "Muscle".

seq_indices

Integer vector of length 2; indices of the two sequences to align when method = "pairwise". Defaults to c(1,2).

Examples

Run this code
# \donttest{
# Pairwise alignment example (requires pwalign package)
if (requireNamespace("pwalign", quietly = TRUE)) {
  data <- data.frame(
    accession = c("seq1", "seq2"),
    sequence  = c("ACGTACGTACGT", "ACGTACGTTTGT"),
    stringsAsFactors = FALSE
  )

  res_pw <- align_sequences(
    df = data,
    method = "pairwise",
    pairwise_type = "global"
  )
  res_pw$pid
}

# Multiple sequence alignment (requires msa package)
if (requireNamespace("msa", quietly = TRUE)) {
  data_msa <- data.frame(
    accession = c("seq1", "seq2", "seq3"),
    sequence = c("ATGCATGC", "ATGCTAGC", "ATGGATGC")
  )
  res_msa <- align_sequences(data_msa, method = "msa", msa_method = "ClustalOmega")
  print(res_msa)
}
# }

Run the code above in your browser using DataLab