Learn R Programming

pctax (version 0.1.7)

pre_format_report: Preprocess MPA Species Abundance and Taxonomy Data

Description

This function reads a species abundance profile from kraken2 format_report output, performs filtering to remove low-abundance and unwanted taxa, and extracts a clean taxonomy table for downstream analysis.

Usage

pre_format_report(dir = "", exclude = NULL, relative_threshold = 1e-04)

Value

A list with two components:

species

A matrix of filtered species abundance data

species_taxonomy

A data.frame of taxonomy information for each species

Arguments

dir

Character. Path to the directory containing the input file "mpa_profile_species.txt". Default is an empty string (current directory).

exclude

Character. Pattern to exclude specific taxa (e.g., "g__Streptococcus"). Uses grepl() for pattern matching. Default is NULL (no exclusion).

relative_threshold

Numeric. Relative abundance threshold for filtering low-abundance taxa. Taxa with mean relative abundance below this threshold will be removed. Default is 1e-4.

Details

The function performs the following steps:

  1. Reads the Metaphlan species profile table

  2. Removes "unclassified" entries and "cellular_organisms" category

  3. Filters out taxa matching the exclude pattern (if provided)

  4. Applies relative abundance filtering using pcutils::rm_low

  5. Extracts and formats taxonomy information from the Metaphlan-style names

  6. Cleans species names by removing the "s__" prefix