metacoder (version 0.3.7)

compare_groups: Compare groups of samples

Description

Apply a function to compare data, usually abundance, from pairs of treatments/groups. By default, every pairwise combination of treatments are compared. A custom function can be supplied to perform the comparison. The plotting function heat_tree_matrix is useful for visualizing these results.

Usage

compare_groups(
  obj,
  data,
  cols,
  groups,
  func = NULL,
  combinations = NULL,
  other_cols = FALSE,
  dataset = NULL
)

Value

A tibble

Arguments

obj

A taxmap object

data

The name of a table in obj that contains data for each sample in columns.

cols

The names/indexes of columns in data to use. By default, all numeric columns are used. Takes one of the following inputs:

TRUE/FALSE:

All/No columns will used.

Character vector:

The names of columns to use

Numeric vector:

The indexes of columns to use

Vector of TRUE/FALSE of length equal to the number of columns:

Use the columns corresponding to TRUE values.

groups

A vector defining how samples are grouped into "treatments". Must be the same order and length as cols.

func

The function to apply for each comparison. For each row in data, for each combination of groups, this function will receive the data for each treatment, passed as two vectors. Therefore the function must take at least 2 arguments corresponding to the two groups compared. The function should return a vector or list of results of a fixed length. If named, the names will be used in the output. The names should be consistent as well. A simple example is function(x, y) mean(x) - mean(y). By default, the following function is used:


function(abund_1, abund_2) {
  log_ratio <- log2(median(abund_1) / median(abund_2))
  if (is.nan(log_ratio)) {
    log_ratio <- 0
  }
  list(log2_median_ratio = log_ratio,
       median_diff = median(abund_1) - median(abund_2),
       mean_diff = mean(abund_1) - mean(abund_2),
       wilcox_p_value = wilcox.test(abund_1, abund_2)$p.value)
}

combinations

Which combinations of groups to use. Must be a list of vectors, each containing the names of 2 groups to compare. By default, all pairwise combinations of groups are compared.

other_cols

If TRUE, preserve all columns not in cols in the output. If FALSE, dont keep other columns. If a column names or indexes are supplied, only preserve those columns.

dataset

DEPRECIATED. use "data" instead.

See Also

Other calculations: calc_diff_abund_deseq2(), calc_group_mean(), calc_group_median(), calc_group_rsd(), calc_group_stat(), calc_n_samples(), calc_obs_props(), calc_prop_samples(), calc_taxon_abund(), counts_to_presence(), rarefy_obs(), zero_low_counts()

Examples

Run this code
if (FALSE) {
# Parse data for plotting
x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")

# Convert counts to proportions
x$data$otu_table <- calc_obs_props(x, data = "tax_data", cols = hmp_samples$sample_id)

# Get per-taxon counts
x$data$tax_table <- calc_taxon_abund(x, data = "otu_table", cols = hmp_samples$sample_id)

# Calculate difference between groups
x$data$diff_table <- compare_groups(x, data = "tax_table",
                                    cols = hmp_samples$sample_id,
                                    groups = hmp_samples$body_site)

# Plot results (might take a few minutes)
heat_tree_matrix(x,
                 data = "diff_table",
                 node_size = n_obs,
                 node_label = taxon_names,
                 node_color = log2_median_ratio,
                 node_color_range = diverging_palette(),
                 node_color_trans = "linear",
                 node_color_interval = c(-3, 3),
                 edge_color_interval = c(-3, 3),
                 node_size_axis_label = "Number of OTUs",
                 node_color_axis_label = "Log2 ratio median proportions")
                 
# How to get results for only some pairs of groups
compare_groups(x, data = "tax_table",
               cols = hmp_samples$sample_id,
               groups = hmp_samples$body_site,
               combinations = list(c('Nose', 'Saliva'),
                                   c('Skin', 'Throat')))

}

Run the code above in your browser using DataCamp Workspace