seqinr (version 1.0-1)

dotPlot: Dot Plot Comparison of two sequences

Description

Dot plots are most likely the oldest visual representation used to compare two sequences (see Maizel and Lenk 1981 and references therein). In its simplest form, a dot is produced at position (i,j) iff character number i in the first sequence is the same as character number j in the second sequence. More eleborated forms use sliding windows and a threshold value for two windows to be considered as matched.

Usage

dotPlot(seq1, seq2, wsize = 1, wstep = 1, nmatch = 1, col = c("white", "black"), 
xlab = deparse(substitute(seq1)), ylab = deparse(substitute(seq2)), ...)

Arguments

seq1
the first sequence (x-axis) as a vector of single chars.
seq2
the second sequence (y-axis) as a vector of single char.
wsize
the size in chars of the moving window.
wstep
the size in chars for the steps of the moving window. Use wstep == wsize for non-overlapping windows.
nmatch
if the number of match per window is greater than or equal to nmatch then a dot is produced.
col
color of points passed to image.
xlab
label of x-axis passed to image.
ylab
label of y-axis passed to image.
...
further arguments passed to image.

Value

  • NULL.

References

Maizel, J.V. and Lenk, R.P. (1981) Enhanced Graphic Matrix Analysis of Nucleic Acid and Protein Sequences. Proceedings of the National Academy of Science USA, 78:7665-7669. citation("seqinr")

See Also

image

Examples

Run this code
#
# Identity is on the main diagonal:
#
dotPlot(letters, letters, main = "Direct repeat")
#
# Internal repeats are off the main diagonal:
#
dotPlot(rep(letters, 2), rep(letters, 2), main = "Internal repeats")
#
# Inversions are orthogonal to the main diagonal:
#
dotPlot(letters, rev(letters), main = "Inversion")
#
# Insertion in the second sequence yields a vertical jump:
#
dotPlot(letters, c(letters[1:10], s2c("insertion"), letters[11:26]), 
  main = "Insertion in the second sequence", asp = 1)
#
# Insertion in the first sequence yields an horizontal jump:
#
dotPlot(c(letters[1:10], s2c("insertion"), letters[11:26]), letters,
  main = "Insertion in the first sequence", asp = 1)
#
# Protein sequences have usually a good signal/noise ratio because there
# are 20 possible amino-acids:
#
aafile <- system.file("sequences/seqAA.fasta", package = "seqinr")
protein <- read.fasta(aafile)[[1]]
dotPlot(protein, protein, main = "Dot plot of a protein
wsize = 1, wstep = 1, nmatch = 1")
#
# Nucleic acid sequences have usually a poor signal/noise ratio because
# there are only 4 different bases:
#
dnafile <- system.file("sequences/malM.fasta", package = "seqinr")
dna <- protein <- read.fasta(dnafile)[[1]]
dotPlot(dna[1:200], dna[1:200], main = "Dot plot of a nucleic acid sequence
wsize = 1, wstep = 1, nmatch = 1")
#
# Play with the wsize, wstep and nmatch arguments to increase the 
# signal/noise ratio:
#
dotPlot(dna[1:200], dna[1:200], wsize = 3, wstep = 3, nmatch = 3,
main = "Dot plot of a nucleic acid sequence
wsize = 3, wstep = 3, nmatch = 3")

Run the code above in your browser using DataCamp Workspace