Importantly, the scope/utility of this score is limited to help users post-hoc filter for genesets that contain mostly up/down-regulated genes.
However, this might not coincide with the geneset pvalues / significance. For example,
genesets may exclusively contain genes with a positive effectsize but at the same time these can all be minor effects/values and thus the geneset
is not significant or less significant than other genesets with the exact same "directionality score".
For example, genesets may contain both up- and down-regulated genes but still be significant when testing with GOAT and using score_type='effectsize'
The scores computed with this function can help in post-hoc interpretation of GOAT results to further narrow down all significant genesets to a
subset with strong directionality. For example, after test_genesets()
we can filter the results for
A) significant genesets and B) that contain at most N genes and C) that are near-exclusively up/down-regulated.
Bringing this all together (also useful for other types of geneset testing, like ORA, score_type="pvalue", etc);
result = test_genesets(genesets_filtered, genelist, method = "goat", score_type = "effectsize", padj_method = "bonferroni", padj_cutoff = 0.05)
result = score_geneset_directionality(result, genelist)
result |> filter(signif & ngenes <= 100 & abs(score_directionality_rank) >= 0.95)
score_geneset_directionality(genesets, genelist)
input genesets
with results in 3 columns;
score_directionality_rank
is the weighted gene score where gene values are the sign of their effectsize and weights are linearly proportional to their inverse ranks in the input genelist.
score_directionality_rank2
is similar, but now using rank^2 gene weights to boost the influence of most-important genes in the input genelist.
score_directionality_value
uses the absolute gene effectsizes as gene weights
Note that the latter is least robust as it depends on the scaling of input data!
tibble with genesets, must contain columns 'source', 'id', 'genes'
tibble with genes, must contain columns 'gene', 'effectsize'
geneset directionality score = weighted mean of respective genes, where gene weights are 1 minus their relative rank in up/down-regulation (depending on negative/positive effectsize) and the value for each gene is -1 or 1 depending on up/down-regulation (sign of effectsize).
gene values and weights A) gene weight between 0 and 1 for the subset of upregulated genes / positive effectsizes;
r = for the subset of genes with effectsize > 0, compute gene rank (1 = highest effectsize, N = smallest effectsize that is greater than zero)
weight = 1 - r/max(r) B) analogous to (A) for the subset of genes with negative effectsize C) result per gene: value = sign of its effectsize, weight = 0 if effectsize 0, otherwise respective weights from (A) or (B)
geneset score_directionality = weighted mean over values/weights of respective genes