metacoder (version 0.3.0)

filter_ambiguous_taxa: Filter ambiguous taxon names

Description

Filter out taxa with ambiguous names, such as "unknown" or "uncultured". NOTE: some parameters of this function are passed to filter_taxa with the "invert" option set to TRUE. Works the same way as filter_taxa for the most part.

Usage

filter_ambiguous_taxa(obj, unknown = TRUE, uncultured = TRUE,
  name_regex = ".", ignore_case = TRUE, subtaxa = FALSE,
  drop_obs = TRUE, reassign_obs = TRUE, reassign_taxa = TRUE)

Arguments

obj

A taxmap object

unknown

If TRUE, Remove taxa with names the suggest they are placeholders for unknown taxa (e.g. "unknown ...").

uncultured

If TRUE, Remove taxa with names the suggest they are assigned to uncultured organisms (e.g. "uncultured ...").

name_regex

The regex code to match a valid character in a taxon name. For example, "[a-z]" would mean taxon names can only be lower case letters.

ignore_case

If TRUE, dont consider the case of the text when determining a match.

subtaxa

(logical or numeric of length 1) If TRUE, include subtaxa of taxa passing the filter. Positive numbers indicate the number of ranks below the target taxa to return. 0 is equivalent to FALSE. Negative numbers are equivalent to TRUE.

drop_obs

(logical) This option only applies to taxmap() objects. If FALSE, include observations (i.e. user-defined data in obj$data) even if the taxon they are assigned to is filtered out. Observations assigned to removed taxa will be assigned to NA. This option can be either simply TRUE/FALSE, meaning that all data sets will be treated the same, or a logical vector can be supplied with names corresponding one or more data sets in obj$data. For example, c(abundance = FALSE, stats = TRUE) would include observations whose taxon was filtered out in obj$data$abundance, but not in obj$data$stats. See the reassign_obs option below for further complications.

reassign_obs

(logical of length 1) This option only applies to taxmap() objects. If TRUE, observations (i.e. user-defined data in obj$data) assigned to removed taxa will be reassigned to the closest supertaxon that passed the filter. If there are no supertaxa of such an observation that passed the filter, they will be filtered out if drop_obs is TRUE. This option can be either simply TRUE/FALSE, meaning that all data sets will be treated the same, or a logical vector can be supplied with names corresponding one or more data sets in obj$data. For example, c(abundance = TRUE, stats = FALSE) would reassign observations in obj$data$abundance, but not in obj$data$stats.

reassign_taxa

(logical of length 1) If TRUE, subtaxa of removed taxa will be reassigned to the closest supertaxon that passed the filter. This is useful for removing intermediate levels of a taxonomy.

Value

A taxmap object

Details

If you encounter a taxon name that represents an ambiguous taxon that is not filtered out by this function, let us know and we will add it.

Examples

Run this code
# NOT RUN {
obj <- parse_tax_data(c("Plantae;Solanaceae;Solanum;lycopersicum",
                        "Plantae;Solanaceae;Solanum;tuberosum",
                        "Plantae;Solanaceae;Solanum;unknown",
                        "Plantae;Solanaceae;Solanum;uncultured",
                        "Plantae;UNIDENTIFIED"))
filter_ambiguous_taxa(obj)

# }

Run the code above in your browser using DataCamp Workspace