diff_mean_test_conserved: Find differentially expressed genes that are conserved across samples

Description

Find differentially expressed genes that are conserved across samples

Usage

diff_mean_test_conserved(
  y,
  group_labels,
  sample_labels,
  balanced = TRUE,
  compare = "each_vs_rest",
  pval_th = 1e-04,
  ...
)

Value

Data frame of results

Arguments

y: A matrix of counts; must be (or inherit from) class dgCMatrix; genes are rows, cells are columns
group_labels: The group labels (i.e. clusters or time points); will be converted to factor
sample_labels: The sample labels; will be converted to factor
balanced: Boolean, see details for explanation; default is TRUE
compare: Specifies which groups to compare, see details; currently only 'each_vs_rest' (the default) is supported
pval_th: P-value threshold used to call a gene differentially expressed when summarizing the tests per gene
...: Parameters passed to diff_mean_test

Details

This function calls diff_mean_test repeatedly and aggregates the results per group and gene.

If balanced is TRUE (the default), it is assumed that each sample spans multiple groups, as would be the case when merging or integrating samples from the same tissue followed by clustering. Here the group labels would be the clusters and cluster markers would have support in each sample.

If balanced is FALSE, an unbalanced design is assumed where each sample contributes to one group. An example is a time series experiment where some samples are taken from time point 1 while other samples are taken from time point 2. The time point would be the group label and the goal would be to identify differentially expressed genes between time points that are supported by many between-sample comparisons.

Output columns:

group1: Group label of the frist group of cells
group2: Group label of the second group of cells; currently fixed to 'rest'
gene: Gene name (from rownames of input matrix)
n_tests: The number of tests this gene participated in for this group
log2FC_min,median,max: Summary statistics for log2FC across the tests
mean1,2_median: Median of group mean across the tests
pval_max: Maximum of p-values across tests
de_tests: Number of tests that showed this gene having a log2FC going in the same direction as log2FC_median and having a p-value <= pval_th

The output is ordered by group1, -de_tests, -abs(log2FC_median), pval_max

Examples

Run this code

# \donttest{
clustering <- 1:ncol(pbmc) %% 2
sample_id <- 1:ncol(pbmc) %% 3
vst_out <- vst(pbmc, return_corrected_umi = TRUE)
de_res <- diff_mean_test_conserved(y = vst_out$umi_corrected, 
group_labels = clustering, sample_labels = sample_id)
# }

Run the code above in your browser using DataLab