Expands gene cluster tables to include all ORFs (annotated and hypothetical) within contigs, normalizing cluster representations for downstream analysis and plotting. Ensures consistent ORF spacing/length across clusters by inserting missing rows.
gc_add(
Data = sbgc,
Annotation = bin_genes,
orf_before_first = 0,
orf_after_last = 0,
orf_range = "All"
)A data.frame with one row per ORF (real/hypothetical), sorted by gene_cluster and
orf_position. Added columns:
GC_orf_positionRelative position within cluster (1-indexed).
GC_present_lengthCount of annotated ORFs in the cluster.
GC_absent_lengthCount of inserted hypothetical ORFs.
GC_lengthTotal ORFs (GC_present_length + GC_absent_length).
A data.frame of annotated ORFs with required columns:
qaccver: ORF identifier (genome---contig_orf_position format).
genome, contig: Genome and contig names.
gene: Gene symbol (NA for hypothetical ORFs).
orf_position: Absolute ORF position on the contig.
gene_cluster: Cluster identifier.
A data.frame of full ORF annotations (e.g., from orf_extract).
Must include qaccver and orf_position.
Integer. Hypothetical ORFs to insert before the first annotated ORF
in each cluster (bounded by contig start). Default: 0.
Integer. Hypothetical ORFs to append after the last annotated ORF
(bounded by contig end). Default: 0.
Character. Controls ORF inclusion and annotation merging:
"All": Include every ORF in the contig range and merge all annotations (default).
"OnlyAnnotated": Keep only ORFs present in Annotation and merge their annotations.
"IgnoreAnnotated": Include all ORFs but skip merging with Annotation.
Hypothetical ORFs are inserted as rows with gene = NA.
Output is always sorted by gene_cluster and orf_position.
Progress messages are printed to console with timestamps.
Contig bounds are respected—insertions never exceed actual ORF positions in Annotation.