DNAbin: Manipulate DNA Sequences in Bit-Level Format

Description

These functions help to manipulate DNA sequences coded in the bit-level coding scheme.

Usage

# S3 method for DNAbin
print(x, printlen = 6, digits = 3, ...)
# S3 method for DNAbin
rbind(...)
# S3 method for DNAbin
cbind(..., check.names = TRUE, fill.with.gaps = FALSE,
             quiet = FALSE)
# S3 method for DNAbin
[(x, i, j, drop = FALSE)
# S3 method for DNAbin
as.matrix(x, ...)
# S3 method for DNAbin
c(..., recursive = FALSE)
# S3 method for DNAbin
as.list(x, ...)
# S3 method for DNAbin
labels(object, ...)

Value

an object of class "DNAbin" in the case of rbind,

cbind, and [.

Arguments

x, object: an object of class "DNAbin".
...: either further arguments to be passed to or from other methods in the case of print, as.matrix, and labels, or a series of objects of class "DNAbin" in the case of rbind, cbind, and c.
printlen: the number of labels to print (6 by default).
digits: the number of digits to print (3 by default).
check.names: a logical specifying whether to check the rownames before binding the columns (see details).
fill.with.gaps: a logical indicating whether to keep all possible individuals as indicating by the rownames, and eventually filling the missing data with insertion gaps (ignored if check.names = FALSE).
quiet: a logical to switch off warning messages when some rows are dropped.
i, j: indices of the rows and/or columns to select or to drop. They may be numeric, logical, or character (in the same way than for standard R objects).
drop: logical; if TRUE, the returned object is of the lowest possible dimension.
recursive: for compatibility with the generic (unused).

Author

Emmanuel Paradis

Details

These are all `methods' of generic functions which are here applied to DNA sequences stored as objects of class "DNAbin". They are used in the same way than the standard R functions to manipulate vectors, matrices, and lists. Additionally, the operators [[ and $ may be used to extract a vector from a list. Note that the default of drop is not the same than the generic operator: this is to avoid dropping rownames when selecting a single sequence.

These functions are provided to manipulate easily DNA sequences coded with the bit-level coding scheme. The latter allows much faster comparisons of sequences, as well as storing them in less memory compared to the format used before ape 1.10.

For cbind, the default behaviour is to keep only individuals (as indicated by the rownames) for which there are no missing data. If fill.with.gaps = TRUE, a `complete' matrix is returned, enventually with insertion gaps as missing data. If check.names = TRUE (the default), the rownames of each matrix are checked, and the rows are reordered if necessary (if some rownames are duplicated, an error is returned). If check.names = FALSE, the matrices must all have the same number of rows, and are simply binded; the rownames of the first matrix are used. See the examples.

as.matrix may be used to convert DNA sequences (of the same length) stored in a list into a matrix while keeping the names and the class. as.list does the reverse operation.

References

Paradis, E. (2007) A Bit-Level Coding Scheme for Nucleotides. https://emmanuelparadis.github.io/misc/BitLevelCodingScheme_20April2007.pdf

Paradis, E. (2012) Analysis of Phylogenetics and Evolution with R (Second Edition). New York: Springer.

Examples

Run this code

data(woodmouse)
woodmouse
print(woodmouse, 15, 6)
print(woodmouse[1:5, 1:300], 15, 6)
### Just to show how distances could be influenced by sampling:
dist.dna(woodmouse[1:2, ])
dist.dna(woodmouse[1:3, ])
### cbind and its options:
x <- woodmouse[1:2, 1:5]
y <- woodmouse[2:4, 6:10]
as.character(cbind(x, y)) # gives warning
as.character(cbind(x, y, fill.with.gaps = TRUE))
if (FALSE) {
as.character(cbind(x, y, check.names = FALSE)) # gives an error
}

Run the code above in your browser using DataLab