Learn R Programming

gclink (version 1.1)

length_filter: Remove Length Outliers from BLAST Results

Description

Filters BLAST hits by removing ORFs whose gene (protein) length is an outlier within the corresponding gene group, as defined by the inter-quartile range (IQR). Hits whose length falls outside the interval [Q1 - down_IQR * IQR, Q3 + up_IQR * IQR] are discarded.

Usage

length_filter(Data = bin_genes, down_IQR = 1.5, up_IQR = 1.5)

Value

The input data frame with outlier rows removed. The returned object is ungrouped regardless of the input grouping.

Arguments

Data

A data frame containing BLAST results. Must include the columns gene (gene symbol) and length (ORF length in amino acids).

down_IQR

Numeric multiplier applied to the IQR for the lower bound (default: 1.5).

up_IQR

Numeric multiplier applied to the IQR for the upper bound (default: 1.5).

Details

  • Filtering is performed within each gene group; outliers are determined independently for every gene symbol.

  • Progress messages report the number of rows before and after filtering.

  • Missing values in length are ignored when computing quantiles.