Learn R Programming

gclink: Gene-Cluster Discovery, Annotation and Visualization

Overview

gclink performs end-to-end analysis of gene clusters (e.g., photosynthesis, carbon/nitrogen/sulfur cycling, carotenoid, antibiotic, or viral genes) from (meta)genomes. It provides:

  • Parsing of Basic Local Alignment Search Tool (BLAST) results in tab-delimited format produced by tools like NCBI BLAST+ and Diamond BLASTp
  • Contiguous cluster detection
  • Publication-ready visualization

Key Features

Adaptive Workflow

  • Works with or without coding sequences input
  • Skips plotting when functional grouping is absent
  • Supports custom gene lists for universal cluster detection

Cluster Detection

  • Density-based identification via AllGeneNum and MinConSeq parameters
  • Handles incomplete gene annotation coverage
  • Optional insertion of hypothetical ORFs at cluster boundaries

Visualization

  • Publication-ready arrow plots with customizable based on gggenes:
    • Color themes
    • Functional group levels
    • Genome subsets

Installation

# Install from CRAN
install.packages("gclink")

# Install from GitHub
if (!require("devtools")) install.packages("devtools")
devtools::install_github("LiuyangLee/gclink")

Case 1: Using blastp result

# Case 1: Using blastp result with Full pipeline (Find Cluster + Extract FASTA + Plot Cluster)
library(gclink)
data(blastp_df)
data(seq_data)
data(photosynthesis_gene_list)
data(PGC_group)
gc_list <- gclink(in_blastp_df = blastp_df,
                  in_seq_data = seq_data,
                  in_gene_list = photosynthesis_gene_list,
                  in_GC_group  = PGC_group,
                  AllGeneNum = 50,
                  MinConSeq  = 25,
                  apply_length_filter = TRUE,
                  down_IQR   = 10,
                  up_IQR     = 10,
                  orf_before_first = 0,
                  orf_after_last = 0,
                  levels_gene_group = c('bch','puh','puf','crt','acsF','assembly','regulator',
                                        'hypothetical ORF'),
                  color_theme = c('#3BAA51','#6495ED','#DD2421','#EF9320','#F8EB00',
                                  '#FF0683','#956548','grey'),
                  genome_subset = NULL)
gc_meta = gc_list[["GC_meta"]]
gc_seq = gc_list[["GC_seq"]]
gc_plot = gc_list[["GC_plot"]]
head(gc_meta)   # Cluster metadata
head(gc_seq)    # FASTA sequences
print(gc_plot)  # Visualization

1 Input Data Preview

1.1 A dataframe of Diamond BLASTp output (e.g., head(blastp_df))

qaccversaccverpidentlengthmismatchgapopenqstartqendsstartsendevaluebitscore
Kuafubacteriaceae--GCA_016703535.1---JADJBV010000002.1_67enzymerhodopsin_XP_002954798.1_Volvox_carteri26.6576343151576663328938.18e-41161
Kuafubacteriaceae--GCA_016703535.1---JADJBV010000002.1_113petB_Candidatus_Methylomirabilis_oxyfera_DAMO_1671_MOX76.62485801426192565.43e-149417
Kuafubacteriaceae--GCA_016703535.1---JADJBV010000002.1_114petC_Candidatus_Nitronauta_litoralis_G3M70_16785_NLI50.81777328184271893.83e-59184
Kuafubacteriaceae--GCA_016703535.1---JADJBV010000002.1_523cruC_Humisphaera_borealis_IPV69_18620_HBS31.53652081142378483981.45e-41151
Kuafubacteriaceae--GCA_016703535.1---JADJBV010000002.1_616rfpB_KL662192_1_93833.02271373422332212.53e-32124
Kuafubacteriaceae--GCA_016703535.1---JADJBV010000002.1_754bchI_p_Myxococcota--c_WYAZ01--o_WYAZ01--GCA_016703535.1---JADJBV010000002.1_754100.034300134313434.73e-249677

1.2 (Optional) A dataframe with SeqName (ORF identifier, Prodigal format: ⁠ORF_id # start # end # strand # ...⁠) and Sequence (e.g., head(seq_data))

SeqNameSequence
Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_1 # 3 # 266 # 1 # ID=85_1;partial=10;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.807CCGGACGCGCCGCCCGCCCCGAAGGCCCCGCCGGCCGCCCCCACCTATCCGCTCGAAGGCGCGCTCGGTATCAGCCGCGTGCGCCTCGTGCGCGCCACGCCCTGCGGCCTCACCGGCCGCGAGCTCGGCGCCGGCGAGGAGGCCCTCCTCGTCCACTTCGACGACGGACGCCCGCCCCTCGCGGTCGCCCCCGACGCGCTCCCGACGCCCCCCGGCGACGGGACGCCCCCCACCGGCGCTCCGCCGGAAGGAGACCCCGCATGA
Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_2 # 263 # 490 # 1 # ID=85_2;partial=00;start_type=ATG;rbs_motif=AGGAG;rbs_spacer=5-10bp;gc_cont=0.737ATGACCCGCCCCGAAGACGCCCCGCCCACCCACGAAGCCGCGGACCGCGCCGTGCGCTCCCTCTTCCAGATCGGTCGCCTCTGGGCCTCCCACGGCCTCGAGATGGGTCGCATGACCTTGCGGACCGCCGCCAAGACCCTCGAGAGCACCGCCGAGACCCTCGAGGACCTCTCCCAGCGCGTCGCCCCCGACGACGAGCGCCCCGCGGACGAACGCGCCGCCGACTGA
Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_3 # 667 # 2184 # -1 # ID=85_3;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=5-10bp;gc_cont=0.775ATGAGCGCGATCGAAGGGACCCGGCCTCGGGACGGCGAGGCCCGCATGCCCGTGGAGGCGACCCCCGTGGAGGCCATCGGGGGCCTCGTCGCCCGGGCGCGTGACGCCGGCTTCGACCACGCGGCCCGGCCCCTCGCCGAGCGCGCGGGGCTGCTGCGCGCGCTCGCGGACGCCATCCTCGCCGACGGGGAGGCCATCGTCGCGCTCCTCGAGGAGGAGACGGGCAAGCCGGCGGCGGAGGCGTGGCTCCACGAGGTCGTGCCGACGGCGGACCTCGGGAGCTGGTGGAGCAGCCAGGGGCCGGCGCACCTCGCGACGGAAGCCGTGCGCCTCGACCCGCTCGCCTACCCTGGCAAGCGCGCGCGCGTCGAGGTGGTCCCGCGTGGCGTCGTGGCGCTGATCACGCCTTGGAACTTCCCGGTGGCGATCCCGCTGCGGACGCTCTTCCCGGCGCTCCTCGCGGGCAACGGCGTCGTCTGGAAGCCGTCCGAGCACACGCCGCGGGTGGCGGCGCGCGTGCACGGGATCGTGCGCGAGGTCTTCGGGCCGGACCTGGTCGAGCTGGTGCAGGGCGCCGGCGCGCAGGGGGCGGCGCTGGTCGAGGCGGACGTGGACGCGGTGGTGTTCACGGGCAGCGTGGCGACCGGGCGGAAGGTCGGCGCGGCGGCGGGGCGGGCGCTCACGCCGGCGTCGCTCGAGCTCGGCGGCAAGGACGCGGCCGTGGTGCTCGACGACGCGGACCTGGAGCGCACGGCCCGGGGCCTGCTCTGGGCGGCGATGGCGAACGCGGGGCAGAACTGCGCCGGGCTCGAGCGCGTCTACGCGGTGGCGGAGGTCGCCGGCCCGCTGAAGGCGCGGCTCGGTGAGCTGGCCGGAGAGCTGGTGCCCGGGCGCGACGTGGGGCCGCTGGTGACCGAGGCGCAGCTCGCGACGGTGGAGCGGCACGTGCGCGAGGCGGTCGACGGGGGCGCGGAGGTGCTGGCCGGCGGCGAGCGGCTCGAGCGGGGCGGGCGCTGGTTCGCGCCGACCGTGCTGGCGGAGGTCGAGCCGTCTTCGGCGGCGCTCCGGGAGGAGACGTTCGGGCCGGTGGTCGTCGTGCAGACGGTGGCGGACGAGGCGGCGGCCGTGGCGGCGGCGAACGACTCGCGCTTCGGGCTGACGGCGAGCGTCTGGACGCGGGACGCGGCGCGCGGGGAGGCGGTCGCACGGCGGCTCCGGGCGGGCGTCGTGACGGTGAACAACCACGCCTTCACCGGGGCCATCCCGGCGCTGCCCTGGGGCGGCGTCGGCGAGACGGGCTTCGGGGTGACGAACTCGCCGCACGCGCTCCACGCATTGGTGCGGCCGCGGGCCGTGGTCGTGGACGGCAACGCGCGGCCGGAGCTCTACTGGCACCCCTACGACGAGGCGCTCGAGCGGCTCGGGAAGGGCATGGCGGCGCTCCGCGGCAAGGGCGGGCCGATCACGAAGGTGCGCGCCGTGGCCAGGCTGCTCGGGGCGCTCCGCCGGCGCTTCTGA

1.3 (Optional) Gene group (e.g., head(PGC_group))

genegene_groupgene_label
bciEbciE
bchBbchB
bchCbchC
bchDbchD

1.4 (Optional) Candidate gene list (e.g., head(photosynthesis_gene_list))

bciE bchB bchC bchD bchE

2 Output Data Preview

2.1 Gene cluster information (GC_meta)

geneqaccversaccverpidentlengthmismatchgapopenqstartqendsstartsendevaluebitscoregenomeorfcontiggenome_contigorf_positiongene_clusterGC_orf_positionGC_present_lengthGC_absent_lengthGC_lengthSeqNameSequencestartenddirectiongene_groupgene_labelPgenomePstartPendPdirection
pufCHouyibacteriaceae--LLY-WYZ-15_3---k141_102864_97pufC_Rhodospirillum_centenum_RC1_2101_RCE53.13351477332963387.66E-112333Houyibacteriaceae--LLY-WYZ-15_3k141_102864_97k141_102864Houyibacteriaceae--LLY-WYZ-15_3---k141_10286497Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1134236Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_97 # 117640 # 118917 # -1 # ID=85_97;partial=00;start_type=GTG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp;gc_cont=0.710GTGAAGAAGATCGCCATCGCCTTCGTGAGCACCTGGCTCCTCATCGGGGCCGTCTACGCCTACGAGCCGACCGAGACCTCGCAGATCGGCGCCGACGGCGTCGCCATGCAGGTCACGCAGACCGAGGACGAGCTCGCCGCGCGCGTGGAGGCGAACACCGTCCCGCCGGCCATCCCGATGCCCCAGAGCAGCGGCGTGCTGGCGGCCGAGGAGTACGAGAACGTGCAGGTCCTCGGCCACCTCAACACGGCCCAGTTCACCCGGCTGATGACCTCCATCACGCTCTGGGTCGCGCCGGAGCAGGGCTGCGCCTACTGCCACAACACGAACAACCTGGCCTCCGACGAGCTCTACACGAAGCGCGTGGCGCGTCGGATGATCCAGATGACCTGGCACATCAACGAGAACTGGCAGTCGCACGTCCAGGAGACCGGCGTGACCTGCTACACGTGCCACCGCGGCAACAACGTGCCCCAGCACATCTGGTTCGAGACGCCGCCCGACGACCACGGCATGGTGGGCTGGCGTGGCTCGCAGAACGCCCCGAACGACCGGACGGGGATCAGCTCCCTGCCGAACGACGTGTTCGAGGTGTTCCTCGAGGAGGACGCGAGCATCCGGGTCCAGTCGGCCGGGGAGGCCTTCCCGAACGAGAACCGCGCGTCCATCAAGCAGGCCGAGTGGACCTATGGGCTGATGATGCACTTCTCCGAGTCGCTCGGGGTGAACTGCACGGCTTGCCACAACTCGCGCTCCTGGAACGACTGGAGCCAGAGCCCGGCCCGCCGCGGGACGGCCTGGCACGGCATCCGGATGGCGCGAAACCTCAACAACCACTGGCTGACGCCGCTGCGCGATCAGTTCCCGCCGAACCGGCTCGGCGAGCTGGGTGACGCCCCGAAGGCCAACTGCGCGACGTGCCACCAGGGCGCGTACCGCCCCCTGCTCGGGCACCGCATGCTCGAGGACTTCCCGTCCCTCGTACGGGCGATGCCGCAGCCCGAGATCGAGCCGGAGCCGGAGCCGGAGCCCGAGCTGGAAGGCGAGGGCGAGGCCGGCGGGCAGCTCGAGCCGGAGGGGGAGGCGCCCGCCGCCGAAGCCCCCGAGGGCACGAACGCTGCGCCGACGGCGATGGCTGCGCCGGCGGCGATGGCCGCTCCGACGGGGATGGCCGCGCCGGCGGCGATGGCTGCGCCGGCGGCGATGGCTGCTCCGGCGGTGGCCGAGCCGACGCCCATGGCCGCGCCGGCGGCGATGGCGGCCCCGGCACCGAACTGA117640118917-1pufCHouyibacteriaceae--LLY-WYZ-15_3---k141_102864---101277FALSE
pufMHouyibacteriaceae--LLY-WYZ-15_3---k141_102864_98pufM_p_Myxococcota--c_Polyangia--o_Polyangiales--ERR1726576_bin.13---k141_102738_310043700143714374.73E-308834Houyibacteriaceae--LLY-WYZ-15_3k141_102864_98k141_102864Houyibacteriaceae--LLY-WYZ-15_3---k141_10286498Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1234236Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_98 # 118914 # 120224 # -1 # ID=85_98;partial=00;start_type=ATG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp;gc_cont=0.704ATGGCCCGCTACCAGAACATCTTCACGCAGATCCAAGTCGTCGGTCCGCCGGACACGCCGCCGCCGATCGACCCGGACTTCCGTACGAAGAAGACGCGCATGTCGCGGCTCCTCGGGTGGTTCGGCAACCCGCAGATCGGCCCCGTCTACCTGGGCTACACCGGCCTGGCGTCCGCGATCAGCTTCTTCATCGCTTTCGAGATCATCGGGCTCAACATGCTGGCCTCGGTGGACTGGGACGTCGTTCAGTTCATCCGCCAGCTCCCCTGGCTCGCGCTCGAACCGCCCCCGCCCTCTGCCGGGCTCTCCATCCCGACGCTTCAGGAGGGCGGCTGGTGGCTCATGGCCGGCTTCTTCCTCACGGCGTCGGTCATTCTCTGGTGGATTCGCACCTATCGGCGCGCACGCGCCCTGAAGATGGGCACGCACGTCGCGTGGGCCTTCGCCTCGGCGATCTGGCTCTACCTCGTCCTCGGCTTCATTCGCCCCTTGCTGATGGGGAGCTGGGGGGAGGCGGTGCCCTTCGGCATCTTCCCGCACCTCGACTGGACCGCCGCCTTCTCCGTTCGCTACGGCAACCTCTTCTACAACCCCTTCCACTGCCTCTCGATCGTCTTCCTCTACGGGTCGACGCTCCTCTTCGCCATGCACGGCGCGACGGTGCTCGCGCTCGGGCACGTGGGCGGTGAGCGTGAGGTGAGCCAGGTGGTCGACCGCGGCACGGCGGCCGAGCGCGGGGCGCTCTTCTGGCGCTGGACGATGGGCTTCAACGCGACCTTCGAGTCCATCCACCGCTGGGCCTGGTGGTTCGCGGTGCTCACGCCGCTCACCGGAGGCATCGGCATCCTCCTGACCGGCACCGCCGTCGACAACTGGTATCAGTGGGCCGTCGAGCACGACTTCGCGCCGGCCTATGAGGAGTCCTACGAGGTCGTCCCCGACCCGGTCGACGACCCGGCGAACGAGGACCTGCCCGGTATGCGCGGTGAGTCCACCGCGCAGTGGGAGCCGACCCCCTACGTGCCCGCCGAGGAGCCGGAGGCGCCCGAGGATGGTGCGGACGGCGCGGCCGCGGTCGAAGGCGTCGACGCCGAGGGCGGCGAGGATGCCGCCGCGGATCCCGCGAGCGAGGGCACGAGCGGCCAGCCGGAGACCGGCGCCGCGGCCCCGGAGAGCGAGCGCCTTCCGGACGAAGCGGCGGCGGCCGAGCCCGAAGGGGCTGCGCCGGAGCCCGAACCCCCCGCGCCGTCCGAGACGGCTGCCCCGAGCGAACCCGAGGCGCCCAGCGCGATGACCCCGGAGCAACCGTGA118914120224-1pufMHouyibacteriaceae--LLY-WYZ-15_3---k141_102864---112742584FALSE
pufLHouyibacteriaceae--LLY-WYZ-15_3---k141_102864_99pufL_p_Myxococcota--c_Polyangia--o_Polyangiales--ERR1726567_bin.15---k141_184359_210027500127512752.63E-214583Houyibacteriaceae--LLY-WYZ-15_3k141_102864_99k141_102864Houyibacteriaceae--LLY-WYZ-15_3---k141_10286499Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1334236Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_99 # 120270 # 121094 # -1 # ID=85_99;partial=00;start_type=ATG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp;gc_cont=0.648ATGGGCCTACTGAGCTTCGAGCGGCGATATCGAGTCCGAGGAGGCACGCTCCTCGGGGGCGACCTATTCGATTTCTGGGTCGGGCCCTTCTACGTGGGGCTCTTCGGCGTCACGACGATCTTCTTCACGATCGTCGGCACCGCGCTGATCCTCTGGGAGGCCTCCCGGGGTGACACCTGGAACCCCTGGCTGATCAACATCCAGCCGCCTCCAATCGAGTACGGGCTCGCCTTCGCGCCCCTCGATCAGGGGGGCATCTGGCAGCTGGTCACCATCTGCGCCATCGGCGCCTTCGGATCCTGGGCGCTCCGACAGGCGGAGATCAGCCGCAAGCTCGGCATGGGCTACCACGTGCCCATCGCCTACGGCGTCGCGGTCTTCGCCTACGTCACGCTCGTGGTGATTCGCCCGGTGATGCTGGGCGCCTGGGGCCACGGCTTCCCCTACGGCATCTTCAGCCACCTCGATTGGGTGTCGAACGTCGGGTACCAGTACCTGCACTTCCACTACAACCCGGCCCACATGATCGCGGTGAGCTTCTTCTTCACCACGACGCTCGCGCTCTCCCTCCACGGCGGTTTGATCCTCTCCGCCGTGAATCCGCCGAAGGGAGAGAAGGTGAAGACCGCCGAGTACGAGGACGGGTTCTTCCGTGACCACATCGGCTACTCGATCGGCGCCCTGGGCATTCATCGACTCGGCCTCTTCCTGGCGCTGAGCGCCGGGATCTGGAGCGCGATCTGCATTCTCATCAGCGGCCCGATGTGGACCAAGGGGTGGCCCGAGTGGTGGGACTGGTGGCTCAACCTCCCCGTGTGGAGCTGA120270121094-1pufLHouyibacteriaceae--LLY-WYZ-15_3---k141_102864---126303454FALSE
bchOHouyibacteriaceae--LLY-WYZ-15_3---k141_102864_100bchO_Pararhodospirillum_photometricum_RSPPHO_00117_RPM44.9265144133295282926.97E-60194Houyibacteriaceae--LLY-WYZ-15_3k141_102864_100k141_102864Houyibacteriaceae--LLY-WYZ-15_3---k141_102864100Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1434236Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_100 # 121191 # 122102 # -1 # ID=85_100;partial=00;start_type=ATG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp;gc_cont=0.762ATGAGCTCGGCCGTCGAAGAGCAGCGCGTCGAGCACCCGCGGGTCGAGCAGCAGCCCATCGAGCAGCAGCGCGTCGAGCACCAGCGCGTCGAGCGTTCGGGCGTGCGGTGGAACGTCGCCCGCCGCGGCGCCGGACCCACGCTCCTGGCGCTCCACGGGACCGGCAGCTCGAGCCGCTCCTTCTGCGCCCTCGCGGCCACGCTCGGTGCTCGCTTCACCGTCGTGGCGCCCGATCTACCCGGCCACGCCGGGAGCCGGATCGATCGCCGCTTCCGCCTCTCGCTCCCCTCGATCGCCGCCGCCCTCGGCGAGCTCATCGAGGCGCTCGCCGTCCAGCCGGCGCTGGTCCTCGCTCACTCCGCGGGCGCGGCGGTGGCGGCGCGCGCCATGCTCGACGGGGCTCTCCGCCCGGCGCTCTTCGTCGGGCTCGGCGCGGCCCTGACGCCCCTCGAGGGGCTCGCCCGGCTCGGCGCGCGCCCGGCGGCCGCGATGCTCGCCCGCTCGCCCATCACGCGGCGGGTGGCGCGCCGGGCTGGAGGCGCCCTCGTCGGACCGATCCTGCGCAGCGTCGGATCCACCGTCGGCCCCGAGGCCACACAGCGCTATCGGGAGCTCGCCCGCGATCCCGCCCACGTCGGGGCGGTCTTCTCGATGCTCGCCCAGTGGGATCTCGACGGGCTCCACGCGGCGCTACCACGCCTGGACGTACCGACCCTGCTCCTCGGCGGCGCCCGCGACGGCGCCACCCCGATCGCCCAGCAGCGCGCCCTCGCACGTCGCCTCCCGGCCGCGCGCGCGCACGTCGTCCTCGGCGCCGGGCACCTGCTCCACGAGGAGCGACCCGCCGAGATCGCGCGCCTCGTCGAGGCCGAGTGGAACAGATTGGACGGCGGTCGTGTCAAAAATGCTTGA121191122102-1bchOHouyibacteriaceae--LLY-WYZ-15_3---k141_102864---135514462FALSE
bchDHouyibacteriaceae--LLY-WYZ-15_3---k141_102864_101bchD_p_Myxococcota--c_Polyangia--o_Polyangiales--GCA_002699025.1---PABA01000098.1_81100587001587158701064Houyibacteriaceae--LLY-WYZ-15_3k141_102864_101k141_102864Houyibacteriaceae--LLY-WYZ-15_3---k141_102864101Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1534236Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_101 # 122099 # 123859 # -1 # ID=85_101;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.792ATGAGCGGCTGGCCCGACGTGGCGCGCGTCGCCGAGCTCCTGAGCGTCGACCCGGACGGCCTCGGAGGCGTGCGCCTGCGGGGTCGCCCGGGGCCGCACCGGCGCCGGGTGCTCGAGTGGGTGCGCGAGAGGCTGGCCCCGGAGGCGCCCTTCCGGCGCCTGCCCGCGCACGTGACCGAGGATCGGCTCCTCGGGGGCCTCGCGCTCGCGGAGACCTTGCGTTCGGGGCGGGCCGTCATGGAGCAGGGCGTGCTCGCGCGGAGCGACGGCGGCCTGCTCGTCGTGGCCATGGCCGAGCGGGCCGAGCGGGAGGTCGTGGCGCACCTCTGCGCGGCCCTCGACCGCGGCGCGATCACCGTCGAACGCGACGGCATGAGCGCCGAGGCGTCCTGCCGCGTGGGCCTCATCGCGCTCGACGAGGGCATCGACGAGGAGCACGTCGACCCGGCGCTCGCCGACCGGCTCGCCTTCGCGCTGGACCTCGACGCGCTCGATCCGCGGGGAGGGGCGGCGCCGGAACACGGACCCGAGGAGGTCGCGCGAGCCCGCGCCCGCCTCCCGCACGTGAGCCTCGGCGACGACATCATCGCGGCCCTCTCGGAGGCGGCCCAGGCCCTCGGCGTGGAGGCGCTCCGGCCGCTCCTGCTCGCGGCGAAGGCGGCCCGCGCGCACGCGGCGCTCCTCGGCCGGACCCGCGTCGAGGAGGAAGACGCCGGGATGGCGGCGCGCCTCGTCCTCGGCCCGAGGGCGACGCGAGCGCCGAGCGCCGAGCCCGAAGAGGCGGCCGAGCGCGAGGCCGAAGAGGGCGACCCCGACCCGGGAGGCGCCGGCGCGGCTGCAGCCGGCGAACGGGCGGACGGCGCCGACGAGGCCCCGCCGGGCGAGGTCCCGCTCGGCGATCTCGTCTTGGCGGCGGCCGAGAGCGGCATCCCGGCGGGGCTGCTCGACGCCCTCGACGTCGGGACCACCCGGCGGGCCGGCGCGACCGGTCGGAGCGGGGCGACGCGCATCGGCCCGAGCGGCGGCCGCCCGGCGGGGACGCGCGCCGCGCCGCCCACCCGAGGCCAGCGCCTGAACGTCGTCGAGACCCTCCGCGCCGCCGCGCCCTGGCAGCGGCTCCGCGGGGGCGGCTTCGGCGCGGGCGTGCGCGTCCGGCCGGAGGACTTCCGTGTCACCCGTCACCGGCAGCCGATCGAGAGCTGCGTGATCTTCGCCGTCGACGCGTCCGGCTCCGCCGCGCTTCGACGCCTGGCCGAGGCGAAGGGCGCCGTCGAGCGCGTGCTCGGCGACTGCTACGTGCGGCGCGACCACGTCGCCCTCGTCGCGTTCCGCCAGGACGGCGCCGAGCTGCTCCTGCCCCCGACGCGCTCCCTCGCCCGCGTGCGTCGCAGCCTGGCTGCCCTCGCCGGCGGCGGCGCGACCCCCCTCGCCGCGGGGATCGACGCCGCCCATCGGCTCGCCCTCGACGCCCGCGGGCGCGGCCGCGAGCCCATCGTGGTCGTCATGACCGACGGGCGGGCGAACGTGACCCGGGACGGCCGCCGGGACCCCGCGGTCGCCACCACGGACGCCCTCGAGAGCGCGCGCGGGCTCCAGCGAGCCGCCGTGCCGACCCTCTTCCTCGACACGGCCCCACGCCCCCGGCGCCGTGCCCGCGAGCTCGCCGAGGCCATGGACGCCCGCTACCTGCCGCTGCCCTACCTCGACGCGGCGGGGATCTCACGCCACGTCCAAGCGCTCGCCCGCGAGGGAGCCCGATGA122099123859-1bchDHouyibacteriaceae--LLY-WYZ-15_3---k141_102864---144596219FALSE
bchIHouyibacteriaceae--LLY-WYZ-15_3---k141_102864_102bchI_p_Myxococcota--c_Polyangia--o_Polyangiales--GCA_002699025.1---PABA01000098.1_8210033900133913391.97E-239652Houyibacteriaceae--LLY-WYZ-15_3k141_102864_102k141_102864Houyibacteriaceae--LLY-WYZ-15_3---k141_102864102Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1634236Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_102 # 123863 # 124879 # -1 # ID=85_102;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.745ATGACGCCCTATCCCTTCACCGCCATCGTCGCGCAGGACGAGCTCAAGCTCGCCCTGCAGATCGCCACCGTCGACCGCAGCATCGGCGGGGTCCTCGCCTTCGGCGACCGCGGCACCGGCAAGTCGACCACCATCCGCGCGCTCGCCCGGCTCCTGCCGCCGATGCGCGTCGTCGCCAGCTGCCCGTACCACTGTGATCCGGCCGACGCGCGCGCTCGCTGTCCGCACTGTGCCGAAGCCGCAGGGGAGCGGGAGGCGATCGAGACGCCCGTGCCGGTCGTGGACCTGCCCCTCGGCGCCACCGAGGATCGCGTCGTCGGCGCGCTCGATCTCGAGGCGGCCCTCACGCGCGGGGAGCGCCGCTTCTCACCGGGCCTGCTCGCCGCGGCGCATCGAGGCTTCCTCTACATCGACGAGGTCAACCTCCTCCCCGATCACCTCGTGGATCTGCTGCTCGACGTCGCGGCCTCGGGCGAGAACGTGGTCGAGCGCGAGGGCCTGAGCGTGCGCCACCCCGCGCGCTTCGTGCTGATCGGCAGCGGAAACCCGGAGGAGGGCGAGCTGCGCCCCCAGCTGCTCGATCGCTTCGGCCTCTCGCTCGAGGTCCGCACGCCGGACGAGGTCGCGACGCGCGTCGAGGTCGTCAAGCGGCGCATGCGCTACGATCAGGACCCGGAGGCCTTCGCGGCCGCCTGGGCGGAGGACGAGGCGGCCCTCATCGTTCGCCTCCGGGACGCGCGGGCGCGCTTGCCCGAGGTGGCCGTCAGCGACGCCGTGATCGAGCGCGCGAGCCGGCTCTGCCAGGCGCTCGGCACCGACGGGCTCCGGGGGGAGCTGACCTTGATCCGCGCCGCGCGCGCGGCCGCCAGCCTCGACGCGCAGCGGGAGGTCGCCGACGTGCACCTCGCCCAGGTCGCCCCCCTCGCGCTCCGCCACCGGCTGCGACGCGCCCCCCTGGACGACGTCGGCTCGGGCGCGCGCGTGCAGAAGGCCGTCGAGGACGTGCTCGGGGGCTGA123863124879-1bchIHouyibacteriaceae--LLY-WYZ-15_3---k141_102864---162237239FALSE

2.2 Gene cluster sequence (GC_seq)

>pufC_Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1
GTGAAGAAGATCGCCATCGCCTTCGTGAGCACCTGGCTCCTCATCGGGGCCGTCTACGCCTACGAGCCGACCGAGACCTCGCAGATCGGCGCCGACGGCGTCGCCATGCAGGTCACGCAGACCGAGGACGAGCTCGCCGCGCGCGTGGAGGCGAACACCGTCCCGCCGGCCATCCCGATGCCCCAGAGCAGCGGCGTGCTGGCGGCCGAGGAGTACGAGAACGTGCAGGTCCTCGGCCACCTCAACACGGCCCAGTTCACCCGGCTGATGACCTCCATCACGCTCTGGGTCGCGCCGGAGCAGGGCTGCGCCTACTGCCACAACACGAACAACCTGGCCTCCGACGAGCTCTACACGAAGCGCGTGGCGCGTCGGATGATCCAGATGACCTGGCACATCAACGAGAACTGGCAGTCGCACGTCCAGGAGACCGGCGTGACCTGCTACACGTGCCACCGCGGCAACAACGTGCCCCAGCACATCTGGTTCGAGACGCCGCCCGACGACCACGGCATGGTGGGCTGGCGTGGCTCGCAGAACGCCCCGAACGACCGGACGGGGATCAGCTCCCTGCCGAACGACGTGTTCGAGGTGTTCCTCGAGGAGGACGCGAGCATCCGGGTCCAGTCGGCCGGGGAGGCCTTCCCGAACGAGAACCGCGCGTCCATCAAGCAGGCCGAGTGGACCTATGGGCTGATGATGCACTTCTCCGAGTCGCTCGGGGTGAACTGCACGGCTTGCCACAACTCGCGCTCCTGGAACGACTGGAGCCAGAGCCCGGCCCGCCGCGGGACGGCCTGGCACGGCATCCGGATGGCGCGAAACCTCAACAACCACTGGCTGACGCCGCTGCGCGATCAGTTCCCGCCGAACCGGCTCGGCGAGCTGGGTGACGCCCCGAAGGCCAACTGCGCGACGTGCCACCAGGGCGCGTACCGCCCCCTGCTCGGGCACCGCATGCTCGAGGACTTCCCGTCCCTCGTACGGGCGATGCCGCAGCCCGAGATCGAGCCGGAGCCGGAGCCGGAGCCCGAGCTGGAAGGCGAGGGCGAGGCCGGCGGGCAGCTCGAGCCGGAGGGGGAGGCGCCCGCCGCCGAAGCCCCCGAGGGCACGAACGCTGCGCCGACGGCGATGGCTGCGCCGGCGGCGATGGCCGCTCCGACGGGGATGGCCGCGCCGGCGGCGATGGCTGCGCCGGCGGCGATGGCTGCTCCGGCGGTGGCCGAGCCGACGCCCATGGCCGCGCCGGCGGCGATGGCGGCCCCGGCACCGAACTGA
>pufM_Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1
ATGGCCCGCTACCAGAACATCTTCACGCAGATCCAAGTCGTCGGTCCGCCGGACACGCCGCCGCCGATCGACCCGGACTTCCGTACGAAGAAGACGCGCATGTCGCGGCTCCTCGGGTGGTTCGGCAACCCGCAGATCGGCCCCGTCTACCTGGGCTACACCGGCCTGGCGTCCGCGATCAGCTTCTTCATCGCTTTCGAGATCATCGGGCTCAACATGCTGGCCTCGGTGGACTGGGACGTCGTTCAGTTCATCCGCCAGCTCCCCTGGCTCGCGCTCGAACCGCCCCCGCCCTCTGCCGGGCTCTCCATCCCGACGCTTCAGGAGGGCGGCTGGTGGCTCATGGCCGGCTTCTTCCTCACGGCGTCGGTCATTCTCTGGTGGATTCGCACCTATCGGCGCGCACGCGCCCTGAAGATGGGCACGCACGTCGCGTGGGCCTTCGCCTCGGCGATCTGGCTCTACCTCGTCCTCGGCTTCATTCGCCCCTTGCTGATGGGGAGCTGGGGGGAGGCGGTGCCCTTCGGCATCTTCCCGCACCTCGACTGGACCGCCGCCTTCTCCGTTCGCTACGGCAACCTCTTCTACAACCCCTTCCACTGCCTCTCGATCGTCTTCCTCTACGGGTCGACGCTCCTCTTCGCCATGCACGGCGCGACGGTGCTCGCGCTCGGGCACGTGGGCGGTGAGCGTGAGGTGAGCCAGGTGGTCGACCGCGGCACGGCGGCCGAGCGCGGGGCGCTCTTCTGGCGCTGGACGATGGGCTTCAACGCGACCTTCGAGTCCATCCACCGCTGGGCCTGGTGGTTCGCGGTGCTCACGCCGCTCACCGGAGGCATCGGCATCCTCCTGACCGGCACCGCCGTCGACAACTGGTATCAGTGGGCCGTCGAGCACGACTTCGCGCCGGCCTATGAGGAGTCCTACGAGGTCGTCCCCGACCCGGTCGACGACCCGGCGAACGAGGACCTGCCCGGTATGCGCGGTGAGTCCACCGCGCAGTGGGAGCCGACCCCCTACGTGCCCGCCGAGGAGCCGGAGGCGCCCGAGGATGGTGCGGACGGCGCGGCCGCGGTCGAAGGCGTCGACGCCGAGGGCGGCGAGGATGCCGCCGCGGATCCCGCGAGCGAGGGCACGAGCGGCCAGCCGGAGACCGGCGCCGCGGCCCCGGAGAGCGAGCGCCTTCCGGACGAAGCGGCGGCGGCCGAGCCCGAAGGGGCTGCGCCGGAGCCCGAACCCCCCGCGCCGTCCGAGACGGCTGCCCCGAGCGAACCCGAGGCGCCCAGCGCGATGACCCCGGAGCAACCGTGA
>pufL_Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1
ATGGGCCTACTGAGCTTCGAGCGGCGATATCGAGTCCGAGGAGGCACGCTCCTCGGGGGCGACCTATTCGATTTCTGGGTCGGGCCCTTCTACGTGGGGCTCTTCGGCGTCACGACGATCTTCTTCACGATCGTCGGCACCGCGCTGATCCTCTGGGAGGCCTCCCGGGGTGACACCTGGAACCCCTGGCTGATCAACATCCAGCCGCCTCCAATCGAGTACGGGCTCGCCTTCGCGCCCCTCGATCAGGGGGGCATCTGGCAGCTGGTCACCATCTGCGCCATCGGCGCCTTCGGATCCTGGGCGCTCCGACAGGCGGAGATCAGCCGCAAGCTCGGCATGGGCTACCACGTGCCCATCGCCTACGGCGTCGCGGTCTTCGCCTACGTCACGCTCGTGGTGATTCGCCCGGTGATGCTGGGCGCCTGGGGCCACGGCTTCCCCTACGGCATCTTCAGCCACCTCGATTGGGTGTCGAACGTCGGGTACCAGTACCTGCACTTCCACTACAACCCGGCCCACATGATCGCGGTGAGCTTCTTCTTCACCACGACGCTCGCGCTCTCCCTCCACGGCGGTTTGATCCTCTCCGCCGTGAATCCGCCGAAGGGAGAGAAGGTGAAGACCGCCGAGTACGAGGACGGGTTCTTCCGTGACCACATCGGCTACTCGATCGGCGCCCTGGGCATTCATCGACTCGGCCTCTTCCTGGCGCTGAGCGCCGGGATCTGGAGCGCGATCTGCATTCTCATCAGCGGCCCGATGTGGACCAAGGGGTGGCCCGAGTGGTGGGACTGGTGGCTCAACCTCCCCGTGTGGAGCTGA
>bchO_Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1
ATGAGCTCGGCCGTCGAAGAGCAGCGCGTCGAGCACCCGCGGGTCGAGCAGCAGCCCATCGAGCAGCAGCGCGTCGAGCACCAGCGCGTCGAGCGTTCGGGCGTGCGGTGGAACGTCGCCCGCCGCGGCGCCGGACCCACGCTCCTGGCGCTCCACGGGACCGGCAGCTCGAGCCGCTCCTTCTGCGCCCTCGCGGCCACGCTCGGTGCTCGCTTCACCGTCGTGGCGCCCGATCTACCCGGCCACGCCGGGAGCCGGATCGATCGCCGCTTCCGCCTCTCGCTCCCCTCGATCGCCGCCGCCCTCGGCGAGCTCATCGAGGCGCTCGCCGTCCAGCCGGCGCTGGTCCTCGCTCACTCCGCGGGCGCGGCGGTGGCGGCGCGCGCCATGCTCGACGGGGCTCTCCGCCCGGCGCTCTTCGTCGGGCTCGGCGCGGCCCTGACGCCCCTCGAGGGGCTCGCCCGGCTCGGCGCGCGCCCGGCGGCCGCGATGCTCGCCCGCTCGCCCATCACGCGGCGGGTGGCGCGCCGGGCTGGAGGCGCCCTCGTCGGACCGATCCTGCGCAGCGTCGGATCCACCGTCGGCCCCGAGGCCACACAGCGCTATCGGGAGCTCGCCCGCGATCCCGCCCACGTCGGGGCGGTCTTCTCGATGCTCGCCCAGTGGGATCTCGACGGGCTCCACGCGGCGCTACCACGCCTGGACGTACCGACCCTGCTCCTCGGCGGCGCCCGCGACGGCGCCACCCCGATCGCCCAGCAGCGCGCCCTCGCACGTCGCCTCCCGGCCGCGCGCGCGCACGTCGTCCTCGGCGCCGGGCACCTGCTCCACGAGGAGCGACCCGCCGAGATCGCGCGCCTCGTCGAGGCCGAGTGGAACAGATTGGACGGCGGTCGTGTCAAAAATGCTTGA
>bchD_Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1
ATGAGCGGCTGGCCCGACGTGGCGCGCGTCGCCGAGCTCCTGAGCGTCGACCCGGACGGCCTCGGAGGCGTGCGCCTGCGGGGTCGCCCGGGGCCGCACCGGCGCCGGGTGCTCGAGTGGGTGCGCGAGAGGCTGGCCCCGGAGGCGCCCTTCCGGCGCCTGCCCGCGCACGTGACCGAGGATCGGCTCCTCGGGGGCCTCGCGCTCGCGGAGACCTTGCGTTCGGGGCGGGCCGTCATGGAGCAGGGCGTGCTCGCGCGGAGCGACGGCGGCCTGCTCGTCGTGGCCATGGCCGAGCGGGCCGAGCGGGAGGTCGTGGCGCACCTCTGCGCGGCCCTCGACCGCGGCGCGATCACCGTCGAACGCGACGGCATGAGCGCCGAGGCGTCCTGCCGCGTGGGCCTCATCGCGCTCGACGAGGGCATCGACGAGGAGCACGTCGACCCGGCGCTCGCCGACCGGCTCGCCTTCGCGCTGGACCTCGACGCGCTCGATCCGCGGGGAGGGGCGGCGCCGGAACACGGACCCGAGGAGGTCGCGCGAGCCCGCGCCCGCCTCCCGCACGTGAGCCTCGGCGACGACATCATCGCGGCCCTCTCGGAGGCGGCCCAGGCCCTCGGCGTGGAGGCGCTCCGGCCGCTCCTGCTCGCGGCGAAGGCGGCCCGCGCGCACGCGGCGCTCCTCGGCCGGACCCGCGTCGAGGAGGAAGACGCCGGGATGGCGGCGCGCCTCGTCCTCGGCCCGAGGGCGACGCGAGCGCCGAGCGCCGAGCCCGAAGAGGCGGCCGAGCGCGAGGCCGAAGAGGGCGACCCCGACCCGGGAGGCGCCGGCGCGGCTGCAGCCGGCGAACGGGCGGACGGCGCCGACGAGGCCCCGCCGGGCGAGGTCCCGCTCGGCGATCTCGTCTTGGCGGCGGCCGAGAGCGGCATCCCGGCGGGGCTGCTCGACGCCCTCGACGTCGGGACCACCCGGCGGGCCGGCGCGACCGGTCGGAGCGGGGCGACGCGCATCGGCCCGAGCGGCGGCCGCCCGGCGGGGACGCGCGCCGCGCCGCCCACCCGAGGCCAGCGCCTGAACGTCGTCGAGACCCTCCGCGCCGCCGCGCCCTGGCAGCGGCTCCGCGGGGGCGGCTTCGGCGCGGGCGTGCGCGTCCGGCCGGAGGACTTCCGTGTCACCCGTCACCGGCAGCCGATCGAGAGCTGCGTGATCTTCGCCGTCGACGCGTCCGGCTCCGCCGCGCTTCGACGCCTGGCCGAGGCGAAGGGCGCCGTCGAGCGCGTGCTCGGCGACTGCTACGTGCGGCGCGACCACGTCGCCCTCGTCGCGTTCCGCCAGGACGGCGCCGAGCTGCTCCTGCCCCCGACGCGCTCCCTCGCCCGCGTGCGTCGCAGCCTGGCTGCCCTCGCCGGCGGCGGCGCGACCCCCCTCGCCGCGGGGATCGACGCCGCCCATCGGCTCGCCCTCGACGCCCGCGGGCGCGGCCGCGAGCCCATCGTGGTCGTCATGACCGACGGGCGGGCGAACGTGACCCGGGACGGCCGCCGGGACCCCGCGGTCGCCACCACGGACGCCCTCGAGAGCGCGCGCGGGCTCCAGCGAGCCGCCGTGCCGACCCTCTTCCTCGACACGGCCCCACGCCCCCGGCGCCGTGCCCGCGAGCTCGCCGAGGCCATGGACGCCCGCTACCTGCCGCTGCCCTACCTCGACGCGGCGGGGATCTCACGCCACGTCCAAGCGCTCGCCCGCGAGGGAGCCCGATGA
>bchI_Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1
ATGACGCCCTATCCCTTCACCGCCATCGTCGCGCAGGACGAGCTCAAGCTCGCCCTGCAGATCGCCACCGTCGACCGCAGCATCGGCGGGGTCCTCGCCTTCGGCGACCGCGGCACCGGCAAGTCGACCACCATCCGCGCGCTCGCCCGGCTCCTGCCGCCGATGCGCGTCGTCGCCAGCTGCCCGTACCACTGTGATCCGGCCGACGCGCGCGCTCGCTGTCCGCACTGTGCCGAAGCCGCAGGGGAGCGGGAGGCGATCGAGACGCCCGTGCCGGTCGTGGACCTGCCCCTCGGCGCCACCGAGGATCGCGTCGTCGGCGCGCTCGATCTCGAGGCGGCCCTCACGCGCGGGGAGCGCCGCTTCTCACCGGGCCTGCTCGCCGCGGCGCATCGAGGCTTCCTCTACATCGACGAGGTCAACCTCCTCCCCGATCACCTCGTGGATCTGCTGCTCGACGTCGCGGCCTCGGGCGAGAACGTGGTCGAGCGCGAGGGCCTGAGCGTGCGCCACCCCGCGCGCTTCGTGCTGATCGGCAGCGGAAACCCGGAGGAGGGCGAGCTGCGCCCCCAGCTGCTCGATCGCTTCGGCCTCTCGCTCGAGGTCCGCACGCCGGACGAGGTCGCGACGCGCGTCGAGGTCGTCAAGCGGCGCATGCGCTACGATCAGGACCCGGAGGCCTTCGCGGCCGCCTGGGCGGAGGACGAGGCGGCCCTCATCGTTCGCCTCCGGGACGCGCGGGCGCGCTTGCCCGAGGTGGCCGTCAGCGACGCCGTGATCGAGCGCGCGAGCCGGCTCTGCCAGGCGCTCGGCACCGACGGGCTCCGGGGGGAGCTGACCTTGATCCGCGCCGCGCGCGCGGCCGCCAGCCTCGACGCGCAGCGGGAGGTCGCCGACGTGCACCTCGCCCAGGTCGCCCCCCTCGCGCTCCGCCACCGGCTGCGACGCGCCCCCCTGGACGACGTCGGCTCGGGCGCGCGCGTGCAGAAGGCCGTCGAGGACGTGCTCGGGGGCTGA

2.3 Gene cluster plot (GC_plot)

Case 2: Using eggNOG (evolutionary gene genealogy Nonsupervised Orthologous Groups) format result

# Case 2: Using eggNOG result with Full pipeline (Find Cluster + Extract FASTA + Plot Cluster)
library(gclink)
data(eggnog_df)
data(seq_data)
data(KO_group)
KOs = c("K02291","K09844","K20611","K13789",
        "K09846","K08926","K08927","K08928",
        "K08929","K13991","K04035","K04039",
        "K11337","K03404","K11336","K04040",
        "K03403","K03405","K04037","K03428",
        "K04038","K06049","K10960","K11333",
        "K11334","K11335","K08226","K08226",
        "K09773")
rename_KOs = paste0("ko:", KOs)
eggnog_df$qaccver = eggnog_df$`#query`
eggnog_df$saccver = eggnog_df$KEGG_ko
eggnog_df$evalue = eggnog_df$evalue
eggnog_df$bitscore = eggnog_df$score
eggnog_df$gene = eggnog_df$KEGG_ko
gc_list_2 = gclink(in_blastp_df = eggnog_df,
                  in_seq_data = seq_data,
                  in_gene_list = rename_KOs,
                  in_GC_group  = KO_group,
                  AllGeneNum = 50,
                  MinConSeq  = 25,
                  apply_evalue_filter = FALSE,
                  min_evalue = 1,
                  apply_score_filter = TRUE,
                  min_score = 10,
                  orf_before_first = 1,
                  orf_after_last = 1,
                  levels_gene_group = c('bch','puh','puf','crt',
                                        'acsF','assembly','hypothetical ORF'),
                  color_theme = c('#3BAA51','#6495ED','#DD2421','#EF9320',
                                  '#F8EB00','#FF0683','grey'))
gc_meta_2 = gc_list_2[["GC_meta"]]
gc_seq_2 = gc_list_2[["GC_seq"]]
gc_plot_2 = gc_list_2[["GC_plot"]]
head(gc_meta_2)   # Cluster metadata
head(gc_seq_2)    # FASTA sequences
print(gc_plot_2)  # Visualization

1 Input Data Preview

1.1 A dataframe of Diamond BLASTp output from eggNOG (e.g., head(eggnog_df))

#queryseed_orthologevaluescoreeggNOG_OGsmax_annot_lvlCOG_categoryDescriptionPreferred_nameGOsECKEGG_koKEGG_PathwayKEGG_ModuleKEGG_ReactionKEGG_rclassBRITEKEGG_TCCAZyBiGG_ReactionPFAMs
Kuafuiibacteriaceae--GCA_016703535.1---JADJBV010000001.1_1439375.Oant_27321.57E-45162COG3293@1|root,COG3293@2|Bacteria,1PVIT@1224|Proteobacteria,2TURP@28211|Alphaproteobacteria,1J3RT@118882|Brucellaceae28211|AlphaproteobacteriaLTransposase DDE domain---ko:K07492----ko00000---DDE_Tnp_1,DDE_Tnp_1_2,DUF4096
Kuafuiibacteriaceae--GCA_016703535.1---JADJBV010000001.1_21173264.KI913949_gene24503.58E-1783.6COG3335@1|root,COG3415@1|root,COG3335@2|Bacteria,COG3415@2|Bacteria,1G39S@1117|Cyanobacteria,1HCKE@1150|Oscillatoriales1117|CyanobacteriaLCOGs COG3415 Transposase and inactivated derivatives---ko:K07494----ko00000---DDE_3,HTH_32,HTH_Tnp_IS630
Kuafuiibacteriaceae--GCA_016703535.1---JADJBV010000001.1_3794903.OPIT5_034003.03E-30114COG3335@1|root,COG3335@2|Bacteria2|BacteriaLDDE superfamily endonuclease---ko:K07494----ko00000---DDE_3,HTH_Tnp_IS630
Kuafuiibacteriaceae--GCA_016703535.1---JADJBV010000001.1_5502025.Hoch_27902.78E-501912AY84@1|root,31QA9@2|Bacteria,1QMYF@1224|Proteobacteria,4374U@68525|delta/epsilon subdivisions,2X20E@28221|Deltaproteobacteria,2YWTZ@29|Myxococcales28221|Deltaproteobacteria---------------
Kuafuiibacteriaceae--GCA_016703535.1---JADJBV010000001.1_11105420.BBPO01000003_gene11212.00E-1172.8COG2887@1|root,COG2887@2|Bacteria,2GJC5@201174|Actinobacteria,2NGJC@228398|Streptacidiphilus201174|ActinobacteriaLProtein of unknown function (DUF2800)recB--ko:K07465----ko00000---PDDEXK_1
Kuafuiibacteriaceae--GCA_016703535.1---JADJBV010000001.1_121122915.AUGY01000071_gene43982.13E-37152COG1201@1|root,COG1201@2|Bacteria,1UHYQ@1239|Firmicutes,4ISB0@91061|Bacilli,277Q5@186822|Paenibacillaceae91061|BacilliLhelicase superfamily c-terminal domain------------DUF1998,Helicase_C

1.2 (Optional) A dataframe with SeqName (ORF identifier, Prodigal format: ⁠ORF_id # start # end # strand # ...⁠) and Sequence (e.g., head(seq_data))

Same with Case 1

1.3 (Optional) KO/gene group (e.g., head(KO_group))

genegene_groupgene_label
ko:K04035acsFacsF
ko:K08226assemblybch2
ko:K04039bchB
ko:K11337bchC
ko:K03404bchD
ko:K11336bchF

1.4 (Optional) Candidate KO/gene list

ko:K04035 ko:K08226 ko:K04039 ko:K11337 ko:K03404 ko:K11336

2 Output Data Preview

2.1 Gene cluster information (GC_meta)

Similar with Case 1

2.2 Gene cluster sequence (GC_seq)

Similar with Case 1

2.3 Gene cluster plot (GC_plot)

Documentation

Full function reference:

?gclink::gclink

Citation

If you use gclink in your research, please cite:

Li, L., Huang, D., Hu, Y., Rudling, N. M., Canniffe, D. P., Wang, F., & Wang, Y. "Globally distributed Myxococcota with photosynthesis gene clusters illuminate the origin and evolution of a potentially chimeric lifestyle." Nature Communications (2023), 14, 6450. https://doi.org/10.1038/s41467-023-42193-7

Dependencies

  • R (≥ 3.5)
  • dplyr (≥ 1.1.4)
  • gggenes (≥ 0.5.1)
  • ggplot2 (≥ 3.5.2)

License

GPL-3 © Liuyang Li

Contact

Copy Link

Version

Install

install.packages('gclink')

Monthly Downloads

130

Version

1.1

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Liuyang Li

Last Published

September 2nd, 2025

Functions in gclink (1.1)

gc_cal

Identify and Extract Gene Clusters from Scaled BLAST Data
gc_plot

Plot Scaled Gene Clusters with Arrows
KO_group

KEGG Orthology (KO) Group Classification
gc_add

Complete Gene Clusters by Adding Missing ORFs
gc_position

Extract ORF Positions for One Specific Gene Cluster
gc_range

Determine ORF Range for a Candidate Gene Cluster
length_filter

Remove Length Outliers from BLAST Results
orf_extract

Extract ORF and Genome Information from BLAST or BLASTP Results
gc_cluster

Identify Breakpoints of Gene Clusters within a Contig
gc_scale

Scale Gene-Cluster Coordinates for Visualization
gclink

Gene-Cluster Discovery, Annotation and Visualization
orf_locate

Parse ORF Coordinates from Prodigal FASTA Headers
eggnog_df

EggNOG Functional Annotation Results
PGC_group

Photosynthesis Gene Classification Groups
blastp_df

BLASTP Results for test Proteins
photosynthesis_gene_list

Photosynthesis Gene List
seq_data

Genomic Sequence Data with Annotations