Learn R Programming

gclink (version 1.1)

gc_add: Complete Gene Clusters by Adding Missing ORFs

Description

Expands gene cluster tables to include all ORFs (annotated and hypothetical) within contigs, normalizing cluster representations for downstream analysis and plotting. Ensures consistent ORF spacing/length across clusters by inserting missing rows.

Usage

gc_add(
  Data = sbgc,
  Annotation = bin_genes,
  orf_before_first = 0,
  orf_after_last = 0,
  orf_range = "All"
)

Value

A data.frame with one row per ORF (real/hypothetical), sorted by gene_cluster and orf_position. Added columns:

GC_orf_position

Relative position within cluster (1-indexed).

GC_present_length

Count of annotated ORFs in the cluster.

GC_absent_length

Count of inserted hypothetical ORFs.

GC_length

Total ORFs (GC_present_length + GC_absent_length).

Arguments

Data

A data.frame of annotated ORFs with required columns:

  • qaccver: ORF identifier (genome---contig_orf_position format).

  • genome, contig: Genome and contig names.

  • gene: Gene symbol (NA for hypothetical ORFs).

  • orf_position: Absolute ORF position on the contig.

  • gene_cluster: Cluster identifier.

Annotation

A data.frame of full ORF annotations (e.g., from orf_extract). Must include qaccver and orf_position.

orf_before_first

Integer. Hypothetical ORFs to insert before the first annotated ORF in each cluster (bounded by contig start). Default: 0.

orf_after_last

Integer. Hypothetical ORFs to append after the last annotated ORF (bounded by contig end). Default: 0.

orf_range

Character. Controls ORF inclusion and annotation merging:

  • "All": Include every ORF in the contig range and merge all annotations (default).

  • "OnlyAnnotated": Keep only ORFs present in Annotation and merge their annotations.

  • "IgnoreAnnotated": Include all ORFs but skip merging with Annotation.

Details

  • Hypothetical ORFs are inserted as rows with gene = NA.

  • Output is always sorted by gene_cluster and orf_position.

  • Progress messages are printed to console with timestamps.

  • Contig bounds are respected—insertions never exceed actual ORF positions in Annotation.