seqinr (version 1.0-1)

read.alignment: Read file in mase, clustal, phylip, fasta or msf format

Description

Read a file in mase, clustal, phylip, fasta or msf format. This formats are used to store nucleotide or protein multiple alignments.

Usage

read.alignment(File, format)

Arguments

File
The name of the file which the aligned sequences in mase format are to be read from. If it does not contain an absolute path, the file name is relative to the current working directory, getwd()
format
a character string specifying the format of the file : mase, clustal, phylip, fasta or msf

Value

  • It returns an object of class alignment which is a list with the following components:
  • nbthe number of aligned sequences
  • nama vector of chars containing the name of the aligned sequences
  • seqa vector of characters containing tha aligned sequences
  • coma vector of chars containing the commentaries for each sequence or NA if there is no comments

Details

"mase"{The mase format is used to store nucleotide or protein multiple alignments. The beginning of the file must contain a header containing at least one line (but the content of this header may be empty). The header lines must begin by ;;. The body of the file has the following structure: First, each entry must begin by one (or more) commentary line. Commentary lines begin by the character ;. Again, this commentary line may be empty. After the commentaries, the name of the sequence is written on a separate line. At last, the sequence itself is written on the following lines. } "clustal"{The CLUSTAL format is the format of the ClustalW multialignment tool output. It can be described as follows. The word CLUSTAL is on the first line of the file. The alignment is displayed in blocks of a fixed length, each line in the block corresponding to one sequence. Each line of each block starts with the sequence name (maximum of 10 characters), followed by at least one space character. The sequence is then displayed in upper or lower cases, '-' denotes gaps. The residue number may be displayed at the end of the first line of each block. } "msf"{ MSF is the multiple sequence alignment format of the GCG sequence analysis package. It begins with the line (all uppercase) !!NA_MULTIPLE_ALIGNMENT 1.0 for nucleic acid sequences or !!AA_MULTIPLE_ALIGNMENT 1.0 for amino acid sequences. Do not edit or delete the file type if its present.(optional). A description line which contains informative text describing what is in the file. You can add this information to the top of the MSF file using a text editor.(optional) A dividing line which contains the number of bases or residues in the sequence, when the file was created, and importantly, two dots (..) which act as a divider between the descriptive information and the following sequence information.(required) msf files contain some other information: the Name/Weight, a Separating Line which must include two slashes (//) to divide the name/weight information from the sequence alignment.(required) and the multiple sequence alignment. } "phylip"{ PHYLIP is a tree construction program. The format is as follows: the number of sequences and their length (in characters) is on the first line of the file. The alignment is displayed in an interleaved or sequential format. The sequence names are limited to 10 characters and may contain blanks. } "fasta"{ Sequence in fasta format begins with a single-line description (distinguished bby a greater-than (>) symbol), followed by sequence data on the next line. }

References

The source code of the C functions that are used to read the differents formats have been taken from SeaView http://pbil.univ-lyon1.fr/software/seaview.html To have an overview of the seqinR's functionnality, please consult this vignette: Charif, D., Lobry, J.R. (2005) SeqinR: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis. Springer Verlag, Biological and Medical Physics/Biomedical Series, in preparation.

Examples

Run this code
mase = read.alignment(File=system.file("sequences/test.mase",package = "seqinr"), format="mase")
	phylip = read.alignment(File=system.file("sequences/test.phylip",package = "seqinr"), format="phylip")
	msf = read.alignment(File=system.file("sequences/test.msf",package = "seqinr"), format="msf")

Run the code above in your browser using DataLab