Filters BLAST hits by removing ORFs whose gene (protein) length
is an outlier within the corresponding gene group, as defined
by the inter-quartile range (IQR). Hits whose length falls
outside the interval
[Q1 - down_IQR * IQR, Q3 + up_IQR * IQR] are discarded.
length_filter(Data = bin_genes, down_IQR = 1.5, up_IQR = 1.5)The input data frame with outlier rows removed. The returned object is ungrouped regardless of the input grouping.
A data frame containing BLAST results. Must include the columns
gene (gene symbol) and length (ORF length in amino
acids).
Numeric multiplier applied to the IQR for the lower bound (default: 1.5).
Numeric multiplier applied to the IQR for the upper bound (default: 1.5).
Filtering is performed within each gene group; outliers are determined independently for every gene symbol.
Progress messages report the number of rows before and after filtering.
Missing values in length are ignored when computing quantiles.