
Last chance! 50% off unlimited learning
Sale ends in
Extracting taxonomic information from ConTax data sets.
getDomain(header)
getPhylum(header)
getClass(header)
getOrder(header)
getFamily(header)
getGenus(header)
getTag(header)
getTaxonomy(header)
A vector of texts, typically the Header
from a table,
containing taxonomy information in the proper format.
A vector containing the sub-texts extracted from each header
text, but
getTaxonomy
returns a table with the full taxonomy, one row for each input header
The ConTax data sets are tables in the FASTA format (see readFasta
),
where the Header
column contains texts according to a strict format.
The header
always starts with a short text, a Tag, which is a unique identifier for every sequence.
The function getTag
will extract this from the header
.
After the Tag follows one or more tokens. One of these tokens must be a string with the following format:
"k__<...>;p__<...>;c__<...>;o__<...>;f__<...>;g__<...>;"
where <...> is some proper text. Here is an example of a proper string:
"k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Staphylococcaceae;g__Staphylococcus;"
The functions getDomain
, ..., getGenus
extracts the
corresponding information from the header
. getTaxonomy
combines all taxonomy extractors, combines these in a table
and imputes missing taxa with parent taxa.
# NOT RUN {
data(contax.trim)
getTag(contax.trim$Header)
getGenus(contax.trim$Header)
getPhylum(contax.trim$Header)
# }
Run the code above in your browser using DataLab