read.nexus.data
Read Character Data In NEXUS Format
This function reads a file with sequences in the NEXUS format.
- Keywords
- file
Usage
read.nexus.data(file)
Arguments
- file
a file name specified by either a variable of mode character, or a double-quoted string.
Details
This parser tries to read data from a file written in a restricted NEXUS format (see examples below).
Please see files data.nex
and taxacharacters.nex
for
examples of formats that will work.
Some noticeable exceptions from the NEXUS standard (non-exhaustive list):
IComments must be either on separate lines or at the end of lines. Examples:
[Comment]
--- OKTaxon ACGTACG [Comment]
--- OK[Comment line 1
Comment line 2]
--- NOT OK!Tax[Comment]on ACG[Comment]T
--- NOT OK!IINo spaces (or comments) are allowed in the sequences. Examples:
name ACGT
--- OKname AC GT
--- NOT OK!IIINo spaces are allowed in taxon names, not even if names are in single quotes. That is, single-quoted names are not treated as such by the parser. Examples:
Genus_species
--- OK'Genus_species'
--- OK'Genus species'
--- NOT OK!IVThe trailing
end
that closes thematrix
must be on a separate line. Examples:taxon AACCGGT
end;
--- OKtaxon AACCGGT;
end;
--- OKtaxon AACCCGT; end;
--- NOT OK!VMultistate characters are not allowed. That is, NEXUS allows you to specify multiple character states at a character position either as an uncertainty,
(XY)
, or as an actual appearance of multiple states,{XY}
. This is information is not handled by the parser. Examples:taxon 0011?110
--- OKtaxon 0011{01}110
--- NOT OK!taxon 0011(01)110
--- NOT OK!VIThe number of taxa must be on the same line as
ntax
. The same applies tonchar
. Examples:ntax = 12
--- OKntax =
12
--- NOT OK!VIIThe word “matrix” can not occur anywhere in the file before the actual
matrix
command, unless it is in a comment. Examples:BEGIN CHARACTERS;
TITLE 'Data in file "03a-cytochromeB.nex"';
DIMENSIONS NCHAR=382;
FORMAT DATATYPE=Protein GAP=- MISSING=?;
["This is The Matrix"]
--- OKMATRIX
BEGIN CHARACTERS;
TITLE 'Matrix in file "03a-cytochromeB.nex"';
--- NOT OK!DIMENSIONS NCHAR=382;
FORMAT DATATYPE=Protein GAP=- MISSING=?;
MATRIX
Value
A list of sequences each made of a single vector of mode character where each element is a (phylogenetic) character state.
References
Maddison, D. R., Swofford, D. L. and Maddison, W. P. (1997) NEXUS: an extensible file format for systematic information. Systematic Biology, 46, 590--621.
See Also
Examples
# NOT RUN {
## Use read.nexus.data to read a file in NEXUS format into object x
# }
# NOT RUN {
x <- read.nexus.data("file.nex")
# }