The pan-matrix is a central data structure for pan-genomic analysis. It is a matrix with
one row for each genome in the study, and one column for each gene cluster. Cell [i,j]
contains an integer indicating how many members genome i has in cluster j.
The input clustering must be an integer vector with one element for each sequence in the study,
typically produced by either bClust or dClust. The name of each element
is a text identifying every sequence. The value of each element indicates the cluster, i.e. those
sequences with identical values are in the same cluster. IMPORTANT: The name of each sequence must
contain the GID-tag for each genome, i.e. they must of the form GID111_seq1, GID111_seq2,...
where the GIDxxx part indicates which genome the sequence belongs to. See panPrep
for details.
The rows of the pan-matrix is named by the GID-tag for every genome. The columns are just named
Cluster_x where x is an integer copied from clustering.