
Top taxa is widely used in data analysis,here we provide a simple function to calculate which simplify your R script.
Top_taxa(input, n, inputformat, outformat)
Data frame with top n taxa
Reads or relative abundance(recommended) of OTU/Taxa/gene data frame,see details in inputformat
Top n taxa remained according to relative abundance
1:data frame with first column of OTUID and last column of taxonomy
2:data frame with first column of OTUID/taxonomy (recommended!!!)
3:data frame of all numeric,with row names of OTUID/taxonomy
return outformat the same as inputformat
return data frame of all numeric with OTU/gene/taxa ID in row names(not available for inputformat 1).
Wang Ningqi2434066068@qq.com
### Data preparation ####
data(testotu)
require(tidyr); require(magrittr) ## Or use pipe command in "dplyr"
testotu.pct <- data.frame(
OTU.ID = testotu[, 1],
sweep(testotu[, -c(1, 22)], 2, colSums(testotu[, -c(1, 22)]), "/"),
taxonomy = testotu[, 22]
)
sep_testotu <- Filter_function(
input = testotu,
threshold = 0.0001,
format = 1
) %>%
separate(
., col = taxonomy,
into = c("Domain", "Phylum", "Order", "Family", "Class", "Genus", "Species"),
sep = ";"
)
phylum <- aggregate(
sep_testotu[, 2:21], by = list(sep_testotu$Phylum), FUN = sum
)
phylum1 <- data.frame(row.names = phylum[, 1], phylum[, -1])
##### Input format 1, top 100 OTU #####
top100otu <- Top_taxa(
input = testotu.pct,
n = 100,
inputformat = 1,
outformat = 1
)
##### Input format 2, top 15 phylum #####
head(phylum)
top15phylum <- Top_taxa(
input = phylum,
n = 15,
inputformat = 2,
outformat = 1
)
##### Input format 3, top 15 phylum #####
head(phylum1)
top15phylum <- Top_taxa(
input = phylum1,
n = 15,
inputformat = 3,
outformat = 1
)
Run the code above in your browser using DataLab