reverse.align: Reverse alignment - from protein sequence alignment to nucleic sequence alignment

Description

This function produces an alignment of nucleic protein-coding sequences, using as a guide the alignment of the corresponding protein sequences.

Usage

reverse.align(nucl.file, protaln.file, input.format = 'fasta', out.file, output.format = 'fasta', align.prot = FALSE, numcode = 1, clustal.path = NULL, forceDNAtolower = TRUE, forceAAtolower = FALSE)

Arguments

nucl.file

A character string specifying the name of the FASTA format file containing the nucleotide sequences.

protaln.file

A character string specifying the name of the file containing the aligned protein sequences. This argument must be provided if align.prot is set to FALSE.

input.format

A character string specifying the format of the protein alignment file : 'mase', 'clustal', 'phylip', 'fasta' or 'msf'.

out.file

A character string specifying the name of the output file.

output.format

A character string specifying the format of the output file. Currently the only implemented format is 'fasta'.

align.prot

Boolean. If TRUE, the nucleic sequences are translated and then the protein sequences are aligned with the ClustalW program. The path of the ClustalW binary must also be given (clustal.path)

numcode

The NCBI genetic code number for the translation of the nucleic sequences. By default the standard genetic code is used.

clustal.path

The path of the ClustalW binary. This argument only needs to be setif align.prot is TRUE.

forceDNAtolower

logical passed to read.fasta for reading the nucleic acid file.

forceAAtolower

logical passed to read.alignment for reading the aligned protein sequence file.

Value

NULL

Details

This function an alignment of nucleic protein-coding sequences using as a guide the alignment of the corresponding protein sequences. The file containing the nucleic sequences is given in the compulsory argument 'nucl.file'; this file must be written in the FASTA format.

The alignment of the protein sequences can either be provided directly, trough the 'protaln.file' parameter, or reconstructed with ClustalW, if the parameter 'align.prot' is set to TRUE. In the latter case, the pathway of the ClustalW binary must be given in the 'clustal.path' argument. The protein and nucleic sequences must have the same name in the files nucl.file and protaln.file.

The reverse-aligned nucleotide sequences are written to the file specified in the compulsory 'out.file' argument. For now, the only output format implemented is FASTA.

Warning: the 'align.prot=TRUE' option has only been tested on LINUX operating systems. ClustalW must be installed on your system in order for this to work.

References

citation('seqinr')

Examples

Run this code


#
# Read example 'bordetella.fasta': a triplet of orthologous genes from
# three bacterial species (Bordetella pertussis, B. parapertussis and
# B. bronchiseptica):
#

nucl.file <- system.file('sequences/bordetella.fasta', package = 'seqinr')
triplet <- read.fasta(nucl.file)

# 
# For this example, 'bordetella.pep.aln' contains the aligned protein
# sequences, in the Clustal format:
#

protaln.file <- system.file('sequences/bordetella.pep.aln', package = 'seqinr')
triplet.pep<- read.alignment(protaln.file, format = 'clustal')

#
# Call reverse.align for this example:
#

reverse.align(nucl.file = nucl.file, protaln.file = protaln.file,
                     input.format = 'clustal', out.file = 'test.revalign')

#
# Simple sanity check against expected result:
#

res.new <- read.alignment("test.revalign", format = "fasta")
data(revaligntest)
stopifnot(identical(res.new, revaligntest))

#
# Alternatively, we can use ClustalW to align the translated nucleic
# sequences. Here the ClustalW program is accessible simply by the
# 'clustalw' name.
#

## Not run: 
# reverse.align(nucl.file = nucl.file, out.file = 'test.revalign.clustal', 
#   align.prot = TRUE, clustal.path = 'clustalw')## End(Not run)

Run the code above in your browser using DataLab

State of Data and AI Literacy Report 2025