collapseByClone identifies the consensus sequence of each clonal
group and appends a column to the input data.frame containing the clonal
consensus for each sequence.
For sequences identified to be part of the same clone, this function defines an
effective sequence that will be representative for all mutations in the clone. Each
position in this consensus (or effective) sequence is created by a weighted sampling
of each mutated base (and non "N", "." or "-" characters) from all the sequences in
the clone.
For example, in a clone with 5 sequences that have a C at position 1, and 5 sequences
with a T at this same position, the consensus sequence will have a C 50% and T 50%
of the time it is called.
The function returns an updated db that collpases all the sequences by clones
defined in the cloneColumn column argument.
Non-terminal branch mutations are defined as the set of mutations that occur on
branches of the lineage tree that are not connected to a leaf. For computational
efficiency, the set of non-terminal branch mutations is approximated as those that are
shared between more than one sequence in a clone. In this case the terminal branch
mutations are filtered out.
This function can be parallelized if db contains thousands of sequences.
Specify the number of cores available using the nproc parameter.