Learn R Programming

mclink: Metabolic Pathway Completeness and Abundance Analysis

Overview

mclink provides comprehensive tools for analyzing metabolic pathway completeness and abundance using KEGG Orthology (KO) data from (meta)genomic and (meta)transcriptomic studies. Key features include:

  • Dual analysis modes: Completeness (presence/absence) and abundance-weighted scoring
  • Flexible input: Works with built-in KEGG references or custom datasets
  • Smart KO handling: Specialized methods for plus-separated (subunits) and comma-separated (isoforms) KOs
  • Publication-ready outputs: Pathway coverage metrics and detailed KO detection reports

Description

The distill analysis of metabolic pathway coverage is calculated based on the abundance or presence of KOs in a given module, as per the KEGG Module Definition. In detail, the coverage of a KEGG Module is determined by first dividing a set of KOs into distinct steps. The coverage for each step is then calculated separately, and summarized as the coverage of this KEGG Module. When calculating coverage, spaces or plus signs connecting KO numbers are interpreted as AND operators, while commas are interpreted as OR operators. For instance, to calculate the completeness for the module M00020: “K00058 K00831 (K01079, K02203, K22305)”:

  • 1: Convert the abundance table of KOs into a 0-1 matrix.
  • 2: Consider K00058 as step 1, K00831 as step 2, and (K01079, K02203, K22305) as step 3.
  • 3: Calculate the maximum (or minimum, mean) value of step 3 (K01079, K02203, K22305). If presence of any KO indicates completeness, calculate the maximum value; If all KOs must be resent for completeness, calculate the minimum value. For a moderate approach, calculate the mean value.
  • 4: Use this value along with the values of step 1 and step 2 to calculate the mean value, representing the completeness of the module M00020.

Key Features

Analysis Modes

  • Completeness analysis: Binary presence/absence scoring
  • Abundance analysis: Weighted by KO abundance levels

Specialized KO Handling: Plus-separated KOs (K1+K2+...)

Scaling MethodDescription
mean (default)Moderate approach - calculates average value of all components
min (conservative)Strict requirement - uses lowest value (all components must be present)
max (liberal)Lenient approach - uses highest value (any component indicates completeness)

Typical Use Cases: Protein complexes, Enzyme subunits

Specialized KO Handling: Comma-separated KOs (K1,K2,...)

Scaling MethodDescription
max (default)Completeness assessment - any component indicates functional pathway
sum (abundance analysis)Abundance quantification - sums all functionally equivalent variants

Typical Use Cases: Gene isoforms, Alternative pathways

Output Options

  • R list: Coverage and KO detection results as dataframes, along with a log string recording the analysis process.
  • File exports: Coverage and KO detection results in TSV format, along with a log file recording the analysis process.
  • Pathway-split exports: Optional per-pathway TSV files (if enabled).

Installation

# Install from CRAN
install.packages("mclink")

# Install from GitHub
if (!require("devtools")) install.packages("devtools")
devtools::install_github("LiuyangLee/mclink")

Quick Start

library(mclink)
# Using built-in datasets
data(KO_pathway_ref)
data(KO_Sample_wide)
# Analyze selected pathways
selected_modules <- c("M00176", "M00165", "M00173", "M00374")
results <- mclink(
  ref = KO_pathway_ref[KO_pathway_ref$Module_Entry %in% selected_modules, ],
  data = KO_Sample_wide,
  table_feature = "completeness"
)
# Access results
head(results$coverage)      # Pathway coverage metrics
head(results$detected_KOs)  # Detected KOs per pathway
head(results$log)           # log

Example for Input and Output Data

1 Input Data Preview (R dataframes KO_Sample_wide and KO_pathway_ref)

1.1 Genome/Sample KO abundance/transcript/prevalence dataframe (e.g., head(KO_Sample_wide))

KOMarinobacter_salariusPseudooceanicola_nanhaiensisAlteromonas_australicaHenriciella_pelagia
K0000161.4495416.923296.8546439.592472
K000020.000000.000005.9836550.000000
K0001449.8441020.2313117.34331224.083303
K000150.0000027.098200.0000000.000000
K0001843.101130.0000019.1150013.344125
K0001949.8441041.8897513.8795289.634129
#### 1.2 (Optional) Pathway information from built-in or user-defined dataframe (e.g., head(KO_pathway_ref))
Orthology_EntryModule_TypeLevel_2Level_3Module_EntryModule_NameDefinitionOrthology_SymbolOrthology_NameKO_Symbol
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
K00855Pathway modulesEnergy metabolismCarbon fixationM00165Reductive pentose phosphate cycle (Calvin cycle)K00855 (K01601-K01602) K00927 (K05298,K00150,K00134) K01803 (K01623,K01624) (K03841,K02446,K11532,K01086) K00615 (K01100,K11532,K01086) (K01807,K01808) K01783PRK, prkBphosphoribulokinase [EC:2.7.1.19]K00855; PRK, prkB
K01601Pathway modulesEnergy metabolismCarbon fixationM00165Reductive pentose phosphate cycle (Calvin cycle)K00855 (K01601-K01602) K00927 (K05298,K00150,K00134) K01803 (K01623,K01624) (K03841,K02446,K11532,K01086) K00615 (K01100,K11532,K01086) (K01807,K01808) K01783rbcL, cbbLribulose-bisphosphate carboxylase large chain [EC:4.1.1.39]K01601; rbcL, cbbL
K01602Pathway modulesEnergy metabolismCarbon fixationM00165Reductive pentose phosphate cycle (Calvin cycle)K00855 (K01601-K01602) K00927 (K05298,K00150,K00134) K01803 (K01623,K01624) (K03841,K02446,K11532,K01086) K00615 (K01100,K11532,K01086) (K01807,K01808) K01783rbcS, cbbSribulose-bisphosphate carboxylase small chain [EC:4.1.1.39]K01602; rbcS, cbbS

2 Output Data Preview (R list mc_list)

2.1 Pathway Coverage Matrix for Genomes or Samples (mc_list$coverage)

Module_EntryLevel_2Level_3Pathway NameDefinition (Key KOs)Marinobacter salariusPseudooceanicola nanhaiensisAlteromonas australicaHenriciella pelagia
M00165Energy metabolismCarbon fixationReductive pentose phosphate cycle (Calvin cycle)K00855 (K01601-K01602) K00927 (K05298,K00150,K00134) K01803 (K01623,K01624) (K03841,K02446,K11532,K01086) K00615 (K01100,K11532,K01086) (K01807,K01808) K017830.45454550.63636360.63636360.6363636
M00173Energy metabolismCarbon fixationReductive citrate cycle (Arnon-Buchanan cycle)(K00169+K00170+K00171+K00172,K03737) ((K01007,K01006) K01595,K01959+K01960,K01958) K00024 (K01676,K01679,K01677+K01678) (K00239+K00240-K00241-K00242,K18556+K18557+K18558+K18559+K18560) (K01902+K01903) (K00174+K00175-K00177-K00176) K00031 (K01681,K27802,K01682) (K15230+K15231,K15232+K15233 K15234)0.50000000.70000000.50000000.5000000
M00376Energy metabolismCarbon fixation3-Hydroxypropionate bi-cycle(K02160+K01961+K01962+K01963) K14468 K14469 K15052 K05606 (K01847,K01848+K01849) (K14471+K14472) (K00239+K00240+K00241) K016790.22222220.55555560.00000000.3333333
M00375Energy metabolismCarbon fixationHydroxypropionate-hydroxybutylate cycle(K01964+K15037+K15036) K15017 K15039 K15018 K15019 K15020 K05606 (K01848+K01849) (K15038,K15017) K14465 (K14466,K18861,K25774) K14534 K15016 K006260.00000000.14285710.00000000.1428571
M00374Energy metabolismCarbon fixationDicarboxylate-hydroxybutyrate cycle(K00169+K00170+K00171+K00172) K01007 K01595 K00024 (K01677+K01678) (K00239+K00240-K00241-K18860) (K01902+K01903) (K15038,K15017) K14465 (K14467,K18861,K25774) K14534 K15016 K006260.23076920.30769230.23076920.3846154
M00377Energy metabolismCarbon fixationReductive acetyl-CoA pathway (Wood-Ljungdahl)K00198 (K05299-K15022,K22015+K25123+K25124) K01938 K01491-K01500 K00297-K25007-K25008 K15023 K14138+K00197+K001940.42857140.42857140.28571430.2857143
Interpretation Guide
  • Autotrophic potential: Genomes with >80% completeness in carbon fixation pathways may represent autotrophs. None of the tested genomes met the 80% completeness threshold for canonical carbon fixation pathways, suggesting they likely utilize heterotrophic metabolism.
  • Verification: Always check presence of pathway-specific marker genes (e.g., RuBisCO for Calvin cycle, aclAB for rTCA cycle)
  • Threshold adjustment: Some pathways have alternative enzyme implementations (e.g., Form II RuBisCO) requiring lower completeness thresholds

2.2 Detected KOs per Genome/Sample (mc_list$detected_KOs)

Module_EntryPathway NameMarinobacter salariusPseudooceanicola nanhaiensisAlteromonas australicaHenriciella pelagia
M00165Calvin cycleK00615 K00855 K01783 K01803 K01807K00134 K00615 K01623 K01783 K01803 K01808...K00134 K00615 K00855 K01623 K01783 K01803K00615 K01623 K01783 K01803...
M00173Arnon-Buchanan cycleK00031 K00239-K00242 K01007 K01595 K01676...K00024 K00239-K00242 K01006 K01679 K01902...K00024 K00239 K00241 K01007 K01595 K01676K00024 K00239-K00242 K01006...
M003763-Hydroxypropionate bi-cycleK00239-K00241 K01847 K01962-K01963 K02160K00239-K00241 K01679 K01847 K01961-K01963...K00239 K00241 K01962-K01963 K02160K00239-K00241 K01847 K01962...
M00375Hydroxypropionate-hydroxybutylate cycleK01848K00626 K05606-K00626 K05606
M00374Dicarboxylate-hydroxybutyrate cycleK00239-K00241 K01007 K01595 K01902K00024 K00239-K00241 K00626 K01902-K01903K00024 K00239 K00241 K01007 K01595 K01902K00024 K00239-K00241 K00626...
M00377Wood-Ljungdahl pathwayK00297 K01491 K01938 K22015K00297 K01491 K01938K00297 K01491K00297 K01491

2.3 Process Log (mc_list$log)

[1] "[2025-08-21 17:35:12] mclink started!"
[2] "[2025-08-21 17:35:12] Input Sample-KO table type: completeness"
[3] "[2025-08-21 17:35:12] Scale method for plus: min"
[4] "[2025-08-21 17:35:12] Scale method for comma: max"
[5] "[2025-08-21 17:35:12] Pathway information dataframe successfully imported."
[6] "[2025-08-21 17:35:12] There are 7 Modules in the Pathway information dataframe: M00165 M00173 M00376 M00375 M00374 M00377 M00176"
[7] "[2025-08-21 17:35:12] Genome-KO File successfully imported."
[8] "[2025-08-21 17:35:12] There are 50 intersect KOs between the KO list and input dataframe: K00024 K00031 K00134 K00239 K00240 K00241 K00242 K00297 K00380 K00381 K00390 K00615 K00626 K00855 K00860 K00955 K00956 K00957 K00958 K01006 K01007 K01491 K01595 K01623 K01676 K01679 K01681 K01682 K01783 K01803 K01807 K01808 K01847 K01848 K01902 K01903 K01938 K01958 K01960 K01961 K01962 K01963 K02160 K02446 K03841 K05606 K08691 K09709 K11532 K22015"
[9] "[2025-08-21 17:35:12] Converting abundance table to completeness table..."
[10] "[2025-08-21 17:35:12] Starting Module: M00165"
[11] "[2025-08-21 17:35:12] Processing module steps: K00855 (K01601-K01602) K00927 (K05298,K00150,K00134) K01803 (K01623,K01624) (K03841,K02446,K11532,K01086) K00615 (K01100,K11532,K01086) (K01807,K01808) K01783"
[12] "[2025-08-21 17:35:12] After omitting minus KOs: K00855 K01601 K00927 (K05298,K00150,K00134) K01803 (K01623,K01624) (K03841,K02446,K11532,K01086) K00615 (K01100,K11532,K01086) (K01807,K01808) K01783"
[13] "[2025-08-21 17:35:12] Nested steps include: "
[14] "[2025-08-21 17:35:12]     (K05298,K00150,K00134)"
[15] "[2025-08-21 17:35:12]     (K01623,K01624)"
[16] "[2025-08-21 17:35:12]     (K03841,K02446,K11532,K01086)"
[17] "[2025-08-21 17:35:12]     (K01100,K11532,K01086)"
[18] "[2025-08-21 17:35:12]     (K01807,K01808)"
[19] "[2025-08-21 17:35:12] Start processing nested steps..."
[20] "[2025-08-21 17:35:12] Analyzing M00165_1: (K05298,K00150,K00134)"
[21] "[2025-08-21 17:35:12] Bracket level: 1"
[22] "[2025-08-21 17:35:12]    Running KOs comma: M00165_1 = K05298"
[23] "[2025-08-21 17:35:12]    Running KOs comma: M00165_1 = K00150"
[24] "[2025-08-21 17:35:12]    Running KOs comma: M00165_1 = K00134"
[25] "[2025-08-21 17:35:12] Analyzing M00165_2: (K01623,K01624)"
[26] "[2025-08-21 17:35:12] Bracket level: 2"
[27] "[2025-08-21 17:35:12]    Running KOs comma: M00165_2 = K01623"
[28] "[2025-08-21 17:35:12]    Running KOs comma: M00165_2 = K01624"
[29] "[2025-08-21 17:35:12] Analyzing M00165_3: (K03841,K02446,K11532,K01086)"
[30] "[2025-08-21 17:35:12] Bracket level: 3"
[31] "[2025-08-21 17:35:12]    Running KOs comma: M00165_3 = K03841"
[32] "[2025-08-21 17:35:12]    Running KOs comma: M00165_3 = K02446"
[33] "[2025-08-21 17:35:12]    Running KOs comma: M00165_3 = K11532"
[34] "[2025-08-21 17:35:12]    Running KOs comma: M00165_3 = K01086"
[35] "[2025-08-21 17:35:12] Analyzing M00165_4: (K01100,K11532,K01086)"
[36] "[2025-08-21 17:35:12] Bracket level: 4"
[37] "[2025-08-21 17:35:12]    Running KOs comma: M00165_4 = K01100"
[38] "[2025-08-21 17:35:12]    Running KOs comma: M00165_4 = K11532"
[39] "[2025-08-21 17:35:12]    Running KOs comma: M00165_4 = K01086"
[40] "[2025-08-21 17:35:12] Analyzing M00165_5: (K01807,K01808)"
[41] "[2025-08-21 17:35:12] Bracket level: 5"
[42] "[2025-08-21 17:35:12]    Running KOs comma: M00165_5 = K01807"
[43] "[2025-08-21 17:35:12]    Running KOs comma: M00165_5 = K01808"
[44] "[2025-08-21 17:35:12] Start processing final step..."
[45] "[2025-08-21 17:35:12] Analyzing M00165: K00855 K01601 K00927 M00165_1 K01803 M00165_2 M00165_3 K00615 M00165_4 M00165_5 K01783"
[46] "[2025-08-21 17:35:12]    Running KOs space: M00165 = K00855"
[47] "[2025-08-21 17:35:12]    Running KOs space: M00165 = K01601"
[48] "[2025-08-21 17:35:12]    Running KOs space: M00165 = K00927"
[49] "[2025-08-21 17:35:12]    Running KOs space: M00165 = M00165_1"
[50] "[2025-08-21 17:35:12]    Running KOs space: M00165 = K01803"
[51] "[2025-08-21 17:35:12]    Running KOs space: M00165 = M00165_2"
[52] "[2025-08-21 17:35:12]    Running KOs space: M00165 = M00165_3"
[53] "[2025-08-21 17:35:12]    Running KOs space: M00165 = K00615"
[54] "[2025-08-21 17:35:12]    Running KOs space: M00165 = M00165_4"
[55] "[2025-08-21 17:35:12]    Running KOs space: M00165 = M00165_5"
[56] "[2025-08-21 17:35:12]    Running KOs space: M00165 = K01783"
[57] "[2025-08-21 17:35:12] Completed Module: M00165"

Documentation

Full function reference:

?mclink::mclink

Citation

If you use mclink in your research, please cite:

Li, L., Huang, D., Hu, Y., Rudling, N. M., Canniffe, D. P., Wang, F., & Wang, Y. "Globally distributed Myxococcota with photosynthesis gene clusters illuminate the origin and evolution of a potentially chimeric lifestyle." Nature Communications (2023), 14, 6450. https://doi.org/10.1038/s41467-023-42193-7

Dependencies

  • R (≥ 3.5)
  • data.table (≥ 1.17.0)
  • dplyr (≥ 1.1.4)
  • stringr (≥ 1.5.1)
  • tibble (≥ 3.2.1)

License

GPL-3 © Liuyang Li

Contact

Copy Link

Version

Install

install.packages('mclink')

Monthly Downloads

179

Version

1.1.1

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Liuyang Li

Last Published

October 26th, 2025

Functions in mclink (1.1.1)

remove_outer_brackets

Remove Outer Parentheses from String
merge_module_name

Merge Module Information with Module Table
process_step_direct

Direct KO Processing Without Special Handling
process_step_space

Process Space-Separated KOs with Mean Calculation
process_step_plus

Process Plus-Separated KOs with Specified Scaling Method
read_and_process_KO_table

Read and Process KO Sample Table with Pathway Information
read_and_process_pathway_infor

Read and process Pathway information dataframe
process_all_modules

Process All Modules in Pathway Information
group_ko_by_module

Group KO Abundance Data by Module
add_rows_if_not_exists

Add Missing Rows to a Data Frame
convert_abundance_to_presence

Convert Abundance Values to Presence/Absence Indicators
escape_special_chars

Escape Special Characters in a String
create_sub_module_sample

Create and Export Pathway-Specific Module Sample Files
extract_inner_brackets

Extract Innermost Parentheses Content
mclink

Metabolic Pathway Coverage Analysis
KO_Sample_wide

KEGG Orthology (KO) Abundance/Presence Across Microbial Samples or Genomes
process_module_definition

Process Module Definition String
ata_cal

Calculate Log2 Ratio of Sample Values to Row Means
process_all_pathways

Process All Pathways Analysis
process_module_structure

Process Module Structure Data
process_module_loop_plus

Process Module Components with Plus Sign Handling
process_module_loop_plu_comma

Process Module Components with Plus and Comma Handling
process_module_loop_comma

Process Module Components with Comma Handling
process_module_step

Process Module Steps with Complex KO String Handling
process_module_brackets

Process Module Brackets Recursively
process_step_comma

Process Comma-Separated KOs with Specified Scaling Method
KO_pathway_ref

KEGG Orthology (KO) Pathway Information Dataset