subsetFun: Filter results by function

Description

Create a SQM or SQMbunch object containing only the ORFs with a given function, and the contigs and bins that contain them.

Usage

subsetFun(
  SQM,
  fun,
  columns = NULL,
  ignore_case = TRUE,
  fixed = FALSE,
  trusted_functions_only = FALSE,
  ignore_unclassified_functions = FALSE,
  rescale_tpm = FALSE,
  rescale_copy_number = FALSE,
  recalculate_bin_stats = FALSE,
  allow_empty = FALSE
)

Value

SQM or SQMbunch object containing only the requested function.

Arguments

SQM: SQM or SQMbunch object to be subsetted.
fun: character. Pattern to search for in the different functional classifications.
columns: character. Restrict the search to the provided column names from SQM$orfs$table. If not provided the search will be performed in all the columns containing functional information (default NULL).
ignore_case: logical Make pattern matching case-insensitive (default TRUE).
fixed: logical. If TRUE, pattern is a string to be matched as is. If FALSE the pattern is treated as a regular expression (default FALSE).
trusted_functions_only: logical. If TRUE, only highly trusted functional annotations (best hit + best average) will be considered when generating aggregated function tables. If FALSE, best hit annotations will be used (default FALSE).
ignore_unclassified_functions: logical. If FALSE, ORFs with no functional classification will be aggregated together into an "Unclassified" category. If TRUE, they will be ignored (default FALSE).
rescale_tpm: logical. If TRUE, TPMs for KEGGs, COGs, and PFAMs will be recalculated (so that the TPMs in the subset actually add up to 1 million). Otherwise, per-function TPMs will be calculated by aggregating the TPMs of the ORFs annotated with that function, and will thus keep the scaling present in the parent object (default FALSE).
rescale_copy_number: logical. If TRUE, copy numbers with be recalculated using the median single-copy gene coverages in the subset. Otherwise, single-copy gene coverages will be taken from the parent object. By default it is set to FALSE, which means that the returned copy numbers for each function will represent the average copy number of that function per genome in the parent object.
recalculate_bin_stats: logical. If TRUE, bin abundance, quality and taxonomy are recalculated based on the contigs present in the subsetted object (default FALSE).
allow_empty: (internal use only).

Examples

Run this code

data(Hadza)
Hadza.iron = subsetFun(Hadza, "iron")
Hadza.carb = subsetFun(Hadza, "Carbohydrate metabolism")
# Search for multiple patterns using regular expressions
Hadza.twoKOs = subsetFun(Hadza, "K00812|K00813", fixed=FALSE)

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples