This function is primarily for internal use, but we export it
to document the underlying logic.
Example:
GGCTAGTT
aligned to GGCTAGAACTAGTT
with
a deletion represented as:
GGCTAGAACTAGTT
GG------CTAGTT GGCTAGTT GG[CTAGAA]CTAGTT
---- ----
Presumed repair mechanism leading to this:
....
GGCTAGAACTAGTT
CCGATCTTGATCAA=>
....
GGCTAG TT
CC GATCAA
....
=>
GGCTAGTT
CCGATCAA
Variant-caller software can represent the
same deletion in several
different, but completely equivalent, ways.
GGC------TAGTT GGCTAGTT GGC[TAGAAC]TAGTT
* --- * ---
GGCT------AGTT GGCTAGTT GGCT[AGAACT]AGTT
** -- ** --
GGCTA------GTT GGCTAGTT GGCTA[GAACTA]GTT
*** - *** -
GGCTAG------TT GGCTAGTT GGCTAG[AACTAG]TT
**** ****
This function finds:
The maximum match of undeleted sequence to the left
of the deletion that is
identical to the right end of the deleted sequence, and
The maximum match of undeleted sequence to the right
of the deletion that
is identical to the left end of the deleted sequence.
The microhomology sequence is the concatenation of items
(1) and (2).
Warning
A deletion in a repeat can also be represented
in several different ways. A deletion in a repeat
is abstractly equivalent to a deletion with microhomology that
spans the entire deleted sequence. For example;
GACTAGCTAGTT
GACTA----GTT GACTAGTT GACTA[GCTA]GTT
*** -*** -
is really a repeat
GACTAG----TT GACTAGTT GACTAG[CTAG]TT
**** ----GACT----AGTT GACTAGTT GACT[AGCT]AGTT
** --** --
This function only flags these
"cryptic repeats" with a -1 return; it does not figure
out the repeat extent.