phyDat: Conversion among Sequence Formats

Description

These functions transform several DNA formats into the phyDat format. allSitePattern generates an alignment of all possible site patterns.

Usage

phyDat(data, type = "DNA", levels = NULL, return.index = TRUE, ...)
dna2codon(x, codonstart = 1, ambiguity = "---", ...)
codon2dna(x)
as.phyDat(x, ...)
# S3 method for factor
as.phyDat(x, ...)
# S3 method for DNAbin
as.phyDat(x, ...)
# S3 method for alignment
as.phyDat(x, type = "DNA", ...)
phyDat2alignment(x)
# S3 method for MultipleAlignment
as.phyDat(x, ...)
# S3 method for phyDat
as.MultipleAlignment(x, ...)
acgt2ry(obj)
# S3 method for phyDat
as.character(x, allLevels = TRUE, ...)
# S3 method for phyDat
as.data.frame(x, ...)
# S3 method for phyDat
as.DNAbin(x, ...)
# S3 method for phyDat
as.AAbin(x, ...)
write.phyDat(x, file, format = "phylip", colsep = "", nbcol = -1,
  ...)
read.phyDat(file, format = "phylip", type = "DNA", ...)
baseFreq(obj, freq = FALSE, all = FALSE, drop.unused.levels = FALSE)
# S3 method for phyDat
subset(x, subset, select, site.pattern = TRUE, ...)
# S3 method for phyDat
[(x, i, j, ..., drop = FALSE)
# S3 method for phyDat
unique(x, incomparables = FALSE, identical = TRUE,
  ...)
removeUndeterminedSites(x, ...)
allSitePattern(n, levels = c("a", "c", "g", "t"), names = NULL)
genlight2phyDat(x, ambiguity = NA)
# S3 method for phyDat
image(x, ...)

Arguments

data

An object containing sequences.

type

Type of sequences ("DNA", "AA", "CODON" or "USER").

levels

Level attributes.

return.index

If TRUE returns a index of the site patterns.

...

further arguments passed to or from other methods.

An object containing sequences.

codonstart

an integer giving where to start the translation. This should be 1, 2, or 3, but larger values are accepted and have for effect to start the translation further within the sequence.

ambiguity

character for ambiguous character and no contrast is provided.

obj

as object of class phyDat

allLevels

return original data.

file

A file name.

format

File format of the sequence alignment (see details). Several popular formats are supported: "phylip", "interleaved", "sequential", "clustal", "fasta" or "nexus", or any unambiguous abbreviation of these.

colsep

a character used to separate the columns (a single space by default).

nbcol

a numeric specifying the number of columns per row (-1 by default); may be negative implying that the nucleotides are printed on a single line.

freq

logical, if 'TRUE', frequencies or counts are returned otherwise proportions

all

all a logical; if all = TRUE, all counts of bases, ambiguous codes, missing data, and alignment gaps are returned as defined in the contrast.

drop.unused.levels

logical, drop unused levels

subset

a subset of taxa.

select

a subset of characters.

site.pattern

select site pattern or sites.

i, j

indices of the rows and/or columns to select or to drop. They may be numeric, logical, or character (in the same way than for standard R objects).

drop

for compatibility with the generic (unused).

incomparables

for compatibility with unique.

identical

if TRUE (default) sequences have to be identical, if FALSE sequences are considered duplicates if distance between sequences is zero (happens frequently with ambiguous sites).

Number of sequences.

names

Names of sequences.

Value

The functions return an object of class phyDat.

Details

If type "USER" a vector has to be give to levels. For example c("a", "c", "g", "t", "-") would create a data object that can be used in phylogenetic analysis with gaps as fifth state. There is a more detailed example for specifying "USER" defined data formats in the vignette "phangorn-specials".

allSitePattern returns all possible site patterns and can be useful in simulation studies. For further details see the vignette phangorn-specials.

write.phyDat calls the function write.dna or write.nexus.data and read.phyDat calls the function read.dna, read.aa or read.nexus.data see for more details over there.

You may import data directly with read.dna or read.nexus.data and convert the data to class phyDat.

The generic function c can be used to to combine sequences and unique to get all unique sequences or unique haplotypes.

acgt2ry converts a phyDat object of nucleotides into an binary ry-coded dataset.

Examples

Run this code

# NOT RUN {
data(Laurasiatherian)
class(Laurasiatherian)
Laurasiatherian
# base frequencies
baseFreq(Laurasiatherian)
baseFreq(Laurasiatherian, all=TRUE)
baseFreq(Laurasiatherian, freq=TRUE)
# subsetting phyDat objects
# the first 5 sequences
subset(Laurasiatherian, subset=1:5)
# the first 5 characters
subset(Laurasiatherian, select=1:5, site.pattern = FALSE)
# the first 5 site patterns (often more than 5 characters)
subset(Laurasiatherian, select=1:5, site.pattern = TRUE)
# transform into old ape format
LauraChar <- as.character(Laurasiatherian)
# and back
Laura <- phyDat(LauraChar)
all.equal(Laurasiatherian, Laura)
# Compute all possible site patterns
# for nucleotides there $4 ^ (number of tips)$ patterns
allSitePattern(5)

# }

Run the code above in your browser using DataLab