- data
A dataframe or a character vector specifying the path to .gbk
files. When a character vector is provided, it is interpreted as file paths
to .gbk files which are then read and processed. The dataframe must contain
columns for unique protein identifiers, cluster identifiers, protein
sequences, and the start and end positions of each gene.
- query
The name of the query cluster to be used for BLAST comparisons.
- id
The name of the column that contains the gene identifiers. Defaults
to "protein_id".
- start
The name of the column specifying the start positions of genes.
Defaults to "start".
- end
The name of the column specifying the end positions of genes.
Defaults to "end".
- cluster
The name of the column specifying the cluster names. Defaults
to "cluster".
- genes
An optional vector of gene identifiers to include in the
analysis. Defaults to NULL.
- identity
Minimum identity threshold for BLAST hits to be considered
significant. Defaults to 30.
- parallel
Logical indicating whether to use parallel processing for
alignments. Defaults to TRUE.