seq_data

A dataset containing DNA sequences from test bacteria with detailed annotation metadata.
The first column combines multiple annotation elements separated by semicolons.

datasets

Performs end-to-end analysis of gene clusters—such as photosynthesis,
carbon/nitrogen/sulfur cycling, carotenoid, antibiotic, or viral marker genes
(e.g., capsid, polymerase, integrase)—from genomes and metagenomes.
It parses Basic Local Alignment Search Tool (BLAST) results in tab-delimited
format produced by tools like NCBI BLAST+ and Diamond BLASTp, filters
Open Reading Frames (ORFs) by length, detects contiguous clusters of reference genes,
optionally extracts genomic coordinates, merges functional annotations, and
generates publication-ready arrow plots. The package works seamlessly with
or without the coding sequences input and skips plotting when no functional
groups are found. For more details see Li et al. (2023) <doi:10.1038/s41467-023-42193-7>.

Liuyang Li

gclink

Gene-Cluster Discovery, Annotation and Visualization

seq_data function

A data frame with multiple rows and 2 variables:<dl>
<dt>SeqName</dt>
<dd>Character. Combined annotation fields separated by semicolons, containing:<ul>
<li><code>ID</code>: Sequence identifier (e.g., "1_7")</li>
<li><code>partial</code>: Completion status ("00" for complete, "01" for partial)</li>
<li><code>start_type</code>: Translation initiation codon (e.g., "GTG", "ATG")</li>
<li><code>rbs_motif</code>: Ribosome binding site motif (e.g., "GGAG/GAGG")</li>
<li><code>rbs_spacer</code>: RBS spacer length (e.g., "5-10bp")</li>
<li><code>gc_cont</code>: GC content (e.g., "0.673")</li>
</ul>
</dd><dt>Sequence</dt>
<dd>Character. DNA sequence (when available) in FASTA format</dd>
</dl>

Format

Genomic Sequence Data with Annotations — seq_data

A data frame with multiple rows and 2 variables:<dl>
<dt>SeqName</dt>
<dd>Character. Combined annotation fields separated by semicolons, containing:<ul>
<li><code>ID</code>: Sequence identifier (e.g., "1_7")</li>
<li><code>partial</code>: Completion status ("00" for complete, "01" for partial)</li>
<li><code>start_type</code>: Translation initiation codon (e.g., "GTG", "ATG")</li>
<li><code>rbs_motif</code>: Ribosome binding site motif (e.g., "GGAG/GAGG")</li>
<li><code>rbs_spacer</code>: RBS spacer length (e.g., "5-10bp")</li>
<li><code>gc_cont</code>: GC content (e.g., "0.673")</li>
</ul>
</dd>

<dt>Sequence</dt>
<dd>Character. DNA sequence (when available) in FASTA format</dd>


</dl>

seq_data: Genomic Sequence Data with Annotations

Description

Usage

Arguments

Format