
Last chance! 50% off unlimited learning
Sale ends in
XStringSet
. Each element of the distance matrix corresponds to the dissimilarity between two sequences in the XStringSet
.
DistanceMatrix(myXStringSet, includeTerminalGaps = FALSE, penalizeGapLetterMatches = TRUE, penalizeGapGapMatches = FALSE, correction = "none", processors = NULL, verbose = TRUE)
XStringSet
object of aligned sequences (DNAStringSet
, RNAStringSet
, or AAStringSet
).
FALSE
, then gap-to-letter matches are not included in the total length used to calculate distance.
FALSE
(the default), then gap-to-gap matches are not included in the total length used to calculate distance.
"none"
or "Jukes-Cantor"
.
NULL
(the default) for all available processors.
dimnames
of the matrix correspond to the names
of the XStringSet
.
myXStringSet
. Ambiguity can be represented using the characters of the IUPAC_CODE_MAP
for DNAStringSet
and RNAStringSet
inputs, or using the AMINO_ACID_CODE
for an AAStringSet
input. For example, the distance between an 'N' and any other nucleotide base is zero. The letters B (N or D), Z (Q or E), and X (any letter) are degenerate in the AMINO_ACID_CODE
.If includeTerminalGaps = FALSE
then terminal gaps ("-" or "." characters) are not included in sequence length. This can be faster since only the positions common to each pair of sequences are compared. Sequences with no overlapping region in the alignment are given a value of NA
, unless includeTerminalGaps = TRUE
, in which case distance is 100%.
Penalizing gap-to-gap and gap-to-letter mismatches specifies whether to penalize these special mismatch types and include them in the total length when calculating distance. Both "-" and "." characters are interpreted as gaps. The default behavior is to calculate distance as the fraction of positions that differ across the region of the alignment shared by both sequences (not including gap-to-gap matches).
The elements of the distance matrix can be referenced by dimnames
corresponding to the names
of the XStringSet
. Additionally, an attribute named "correction" specifying the method of correction used can be accessed using the function attr
.
IdClusters
# defaults compare intersection of internal ranges:
dna <- DNAStringSet(c("ANGCT-","-ACCT-"))
d <- DistanceMatrix(dna)
# d[1,2] is 1 base in 4 = 0.25
# compare the entire sequence ranges:
dna <- DNAStringSet(c("ANGCT-","-ACCT-"))
d <- DistanceMatrix(dna, includeTerminalGaps=TRUE,
penalizeGapGapMatches=TRUE)
# d[1,2] is now 3 bases in 6 = 0.50
# compare union of internal ranges:
dna <- DNAStringSet(c("ANGCT-","-ACCT-"))
d <- DistanceMatrix(dna, includeTerminalGaps=TRUE,
penalizeGapGapMatches=FALSE)
# d[1,2] is now 2 bases in 5 = 0.40
# gap ("-") and unknown (".") characters are interchangeable:
dna <- DNAStringSet(c("ANGCT.",".ACCT-"))
d <- DistanceMatrix(dna, includeTerminalGaps=TRUE,
penalizeGapGapMatches=FALSE)
# d[1,2] is still 2 bases in 5 = 0.40
Run the code above in your browser using DataLab