Pairwise Distances from DNA Sequences
This function computes a matrix of pairwise distances from DNA sequences using a model of DNA evolution. Eleven substitution models (and the raw distance) are currently available.
dist.dna(x, model = "K80", variance = FALSE, gamma = FALSE, pairwise.deletion = FALSE, base.freq = NULL, as.matrix = FALSE)
- a matrix or a list containing the DNA sequences.
- a character string specifying the evlutionary model to be
used; must be one of
- a logical indicating whether to compute the variances
of the distances; defaults to
FALSEso the variances are not computed.
- a value for the gamma parameter which is possibly used to
apply a gamma correction to the distances (by default
gamma = FALSEso no correction is applied).
- a logical indicating whether to delete the sites with missing data in a pairwise way. The default is to delete the sites with at least one missing data for all sequences.
- the base frequencies to be used in the computations
(if applicable, i.e. if
method = "F84"). By default, the base frequencies are computed from the whole sample of sequences.
- a logical indicating whether to return the results as a matrix. The default is to return an object of class dist.
The molecular evolutionary models available through the option
model have been extensively described in the literature. A
brief description is given below; more details can be found in the
have no effect, but
- an object of class dist (by default), or a numeric
as.matrix = TRUE. If
model = "BH87", a numeric matrix is returned because the Barry--Hartigan distance is not symmetric.
variance = TRUEan attribute called
"variance"is given to the returned object.
Barry, D. and Hartigan, J. A. (1987) Asynchronous distance between homologous DNA sequences. Biometrics, 43, 261--276.
Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution, 17, 368--376.
Felsenstein, J. and Churchill, G. A. (1996) A Hidden Markov model approach to variation among sites in rate of evolution. Molecular Biology and Evolution, 13, 93--104.
Galtier, N. and Gouy, M. (1995) Inferring phylogenies from DNA sequences of unequal base compositions. Proceedings of the National Academy of Sciences USA, 92, 11317--11321.
Jukes, T. H. and Cantor, C. R. (1969) Evolution of protein molecules. in Mammalian Protein Metabolism, ed. Munro, H. N., pp. 21--132, New York: Academic Press.
Kimura, M. (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16, 111--120.
Kimura, M. (1981) Estimation of evolutionary distances between homologous nucleotide sequences. Proceedings of the National Academy of Sciences USA, 78, 454--458.
Jin, L. and Nei, M. (1990) Limitations of the evolutionary parsimony method of phylogenetic analysis. Molecular Biology and Evolution, 7, 82--102.
Lake, J. A. (1994) Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proceedings of the National Academy of Sciences USA, 91, 1455--1459.
Lockhart, P. J., Steel, M. A., Hendy, M. D. and Penny, D. (1994) Recovering evolutionary trees under a more realistic model of sequence evolution. Molecular Biology and Evolution, 11, 605--602.
McGuire, G., Prentice, M. J. and Wright, F. (1999). Improved error bounds for genetic distances from DNA sequences. Biometrics, 55, 1064--1070.
Tamura, K. (1992) Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G + C-content biases. Molecular Biology and Evolution, 9, 678--687.
Tamura, K. and Nei, M. (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution, 10, 512--526.