Compare groups of samples

Apply a function to compare data, usually abundance, from pairs of treatments/groups. By default, every pairwise combination of treatments are compared. A custom function can be supplied to perform the comparison. The plotting function heat_tree_matrix is useful for visualizing these results.

compare_groups(obj, dataset, cols, groups, func = NULL, combinations = NULL,
  other_cols = FALSE)

A taxmap object


The name of a table in obj that contains data for each sample in columns.


The names/indexes of columns in dataset to use. By default, all numeric columns are used. Takes one of the following inputs:


All/No columns will used.

Character vector:

The names of columns to use

Numeric vector:

The indexes of columns to use

Vector of TRUE/FALSE of length equal to the number of columns:

Use the columns corresponding to TRUE values.


A vector defining how samples are grouped into "treatments". Must be the same order and length as cols.


The function to apply for each comparison. For each row in dataset, for each combination of groups, this function will receive the data for each treatment, passed as two character vectors. Therefore the function must take at least 2 arguments corresponding to the two groups compared. The function should return a vector or list of results of a fixed length. If named, the names will be used in the output. The names should be consistent as well. A simple example is function(x, y) mean(x) - mean(y). By default, the following function is used:

function(abund_1, abund_2) {
  log_ratio <- log2(median(abund_1) / median(abund_2))
  if (is.nan(log_ratio)) {
    log_ratio <- 0
  list(log2_median_ratio = log_ratio,
       median_diff = median(abund_1) - median(abund_2),
       mean_diff = mean(abund_1) - mean(abund_2),
       wilcox_p_value = wilcox.test(abund_1, abund_2)$p.value)

Which combinations of groups to use. Must be a list of vectors, each containing the names of 2 groups to compare. By default, all pairwise combinations of groups are compared.


If TRUE, preserve all columns not in cols in the output. If FALSE, dont keep other columns. If a column names or indexes are supplied, only preserve those columns.


A tibble

See Also

Other calculations: calc_group_mean, calc_group_median, calc_group_rsd, calc_group_stat, calc_n_samples, calc_obs_props, calc_taxon_abund, rarefy_obs, zero_low_counts

  • compare_groups
# Parse dataset for plotting
x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "info", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")

# Convert counts to proportions
x$data$otu_table <- calc_obs_props(x, dataset = "tax_data", cols = hmp_samples$sample_id)

# Get per-taxon counts
x$data$tax_table <- calc_taxon_abund(x, dataset = "otu_table", cols = hmp_samples$sample_id)

# Calculate difference between groups
x$data$diff_table <- compare_groups(x, dataset = "tax_table",
                                    cols = hmp_samples$sample_id,
                                    groups = hmp_samples$body_site)

# Plot results (might take a few minutes)
                 dataset = "diff_table",
                 node_size = n_obs,
                 node_label = taxon_names,
                 node_color = log2_median_ratio,
                 node_color_range = diverging_palette(),
                 node_color_trans = "linear",
                 node_color_interval = c(-3, 3),
                 edge_color_interval = c(-3, 3),
                 node_size_axis_label = "Number of OTUs",
                 node_color_axis_label = "Log2 ratio median proportions")

# }
# }
Documentation reproduced from package metacoder, version 0.2.1, License: GPL-2 | GPL-3

Community examples

Looks like there are no examples yet.