Learn R Programming

rsahmi (version 0.0.2)

remove_contaminants: Identifying contaminants and false positives taxa (cell line quantile test)

Description

Identifying contaminants and false positives taxa (cell line quantile test)

Usage

remove_contaminants(
  kraken_reports,
  study = "current study",
  taxon = c("d__Bacteria", "d__Fungi", "d__Viruses"),
  quantile = 0.95,
  alpha = 0.05,
  alternative = "greater",
  exclusive = FALSE
)

Value

A polars DataFrame with following attributes:

  1. pvalues: Quantile test pvalue.

  2. exclusive: taxids in current study but not found in cellline data.

  3. significant: significant taxids with pvalues < alpha.

  4. truly: truly taxids based on alpha and exclusive. If exclusive is TRUE, this should be the union of exclusive and significant, otherwise, this should be the same with significant.

Arguments

kraken_reports

A character of path to all kraken report files.

study

A string of the study name, used to differentiate with cell line data.

taxon

An atomic character specify the taxa name wanted. Should follow the kraken style, connected by rank codes, two underscores, and the scientific name of the taxon (e.g., "d__Viruses")

quantile

Probabilities with values in [0, 1] specifying the quantile to calculate.

alpha

Level of significance.

alternative

A string specifying the alternative hypothesis, must be one of "two.sided", "greater" (default) or "less". You can specify just the initial letter.

exclusive

A boolean value, indicates whether taxa not found in celllines data should be regarded as truly. Default: FALSE.

Examples

Run this code
if (FALSE) {
# `paths` should be the output directory for each sample from
# `blit::kraken2()`
truly_microbe <- remove_contaminants(
    kraken_reports = file.path(paths, "kraken_report.txt"),
    quantile = 0.99, exclusive = FALSE
)
microbe_for_plot <- attr(truly_microbe, "truly")[
    order(attr(truly_microbe, "pvalue")[attr(truly_microbe, "truly")])
]
microbe_for_plot <- microbe_for_plot[
    !microbe_for_plot %in% attr(truly_microbe, "exclusive")
]
ggplot(
    truly_microbe$filter(pl$col("taxid")$is_in(microbe_for_plot))$
        to_data_frame(),
    aes(rpmm),
) +
    geom_density(aes(fill = study), alpha = 0.5) +
    scale_x_log10() +
    facet_wrap(facets = vars(taxa), scales = "free") +
    theme(
        strip.clip = "off",
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        legend.position = "inside",
        legend.position.inside = c(1, 0),
        legend.justification.inside = c(1, 0)
    )
}

Run the code above in your browser using DataLab