gosummaries: Constructor for gosummaries object

Description

Constructor for gosummaries object that contains all the necessary information to draw the figure, like gene lists and their annotations, expression data and all the relevant texts.

Usage

gosummaries(x = NULL, ...)
"gosummaries"(x = NULL, wc_data = NULL, organism = "hsapiens", go_branches = c("BP", "ke", "re"), max_p_value = 0.01, min_set_size = 50, max_set_size = 1000, max_signif = 40, ordered_query = TRUE, hier_filtering = "moderate", score_type = "p-value", wc_algorithm = "middle", wordcloud_legend_title = NULL, ...)

Arguments

list of arrays of gene names (or list of lists of arrays of gene names)

wc_data

precalculated GO enrichment results (see Details)

organism

the organism that the gene lists correspond to. The format should be as follows: "hsapiens", "mmusculus", "scerevisiae", etc.

go_branches

GO tree branches and pathway databases as denoted in g:Profiler (Possible values: BP, CC, MF, ke, re)

max_p_value

threshold for p-values that have been corrected for multiple testing

min_set_size

minimal size of functional category to be considered

max_set_size

maximal size of functional category to be considered

max_signif

maximal number of categories returned per query

ordered_query

logical showing if the lists are ordered or not (it determines if the ordered query algorithm is used in g:Profiler)

hier_filtering

a type of hierarchical filtering used when reducing the number of g:Profiler results (see gprofiler for further information)

score_type

indicates the type of scores in wc_data. Possible values: "p-value" and "count"

wc_algorithm

the type of wordcloud algorithm used. Possible values are "top" that puts first word to the top corner and "middle" that puts first word to the middle.

wordcloud_legend_title

title of the word cloud legend, should reflect the nature of the score

...

additional parameters for gprofiler function

Value

A gosummaries type of object

Details

The object is a list of "components", with each component defined by a gene list or a pair of gene lists. Each "component" has the slots as follows:

Title: title string of the component. (Default: the names of the gene lists)
Gene_lists: list of one or two gene lists
WCD: g:Profiler results based on the Gene_lists slot or user entered table.
Data: the related data (expression values, PCA rotation, ...) that is used to draw the "panel" i.e. theplot above the wordclouds. In principle there is no limitation what kind of data is there as far as the function that is provided to draw that in plot.gosummaries can use it.
Percentage: a text that is drawn on the right top corner of every component. In case of PCA this is the percentage of variation the component explains, by default it just depicts the number of genes in the Gene_lists slot.

Some visual parameters are stored in the attributes of gosummaries object: score_type tells how to handle the scores associated to wordclouds, wc_algorithm specifies the wordcloud layout algorithm and wordcloud_legend_title specifies the title of the wordcloud. One can change them using the attr function.

The word clouds are specified as data.frames with two columns: "Term" and "Score". If one wants to use custom data for wordclouds, instead of the default GO enrichment results, then this is possible to specify parameter wc_data. The input structure is similar to the gene list input, only instead of gene lists one has the two column data.frames.

The GO enrichment analysis is performed using g:Profiler web toolkit and its associated R package gProfileR. This means the computer has to have internet access to annotate the gene lists. Since g:Profiler can accept a wide range of gene IDs then user usually does not have to worry about converitng the gene IDs into right format. To be absolutely sure the tool recognizes the gene IDs one can check if they will give any results in http://biit.cs.ut.ee/gprofiler/gconvert.cgi.

There can be a lot of results for a typical GO enrichment analysis but usually these tend to be pretty redundant. Since one can fit only a small number of categories into a word cloud we have to bring down the number of categories to show an reduce the redundancy. For this we use hierarchical filtering option \"moderate\" in g:Profiler. In g:Profiler the categories are grouped together when they share one or more enriched parents. The \"moderate\" option selects the most significant category from each of such groups. (See more at http://biit.cs.ut.ee/gprofiler/)

The slots of the object can be filled with custom information using a function add_to_slot.gosummaries.

By default the Data slot is filled with a dataset that contains the number of genes in the Gene_lists slot. Expression data can be added to the object for example by using function add_expression.gosummaries. It is possible to derive your own format for the Data slot as well, as long as a panel plotting function for this data is alaso provided (See panel_boxplot for further information).

There are several constructors of gosummaries object that work on common analysis result objects, such as gosummaries.kmeans, gosummaries.MArrayLM and gosummaries.prcomp corresponding to k-means, limma and PCA results.

Examples

Run this code

## Not run: 
# # Define gene lists 
# genes1 = c("203485_at", "209469_at", "209470_s_at", "203999_at", 
# "205358_at", "203130_s_at", "210222_s_at", "202508_s_at", "203001_s_at", 
# "207957_s_at", "203540_at", "203000_at", "219619_at", "221805_at", 
# "214046_at", "213135_at", "203889_at", "209990_s_at", "210016_at", 
# "202507_s_at", "209839_at", "204953_at", "209167_at", "209685_s_at",  
# "211276_at", "202391_at", "205591_at", 
# "201313_at")
# genes2 = c("201890_at", "202503_s_at", "204170_s_at", "201291_s_at", 
# "202589_at", "218499_at", "209773_s_at", "204026_s_at", "216237_s_at", 
# "202546_at", "218883_s_at", "204285_s_at", "208659_at", "201292_at", 
# "201664_at")
# 
# 
# gl1 = list(List1 = genes1,  List2 = genes2) # One list per component
# gl2 = list(List = list(genes1, genes2)) # Two lists per component
# 
# # Construct gosummaries objects
# gs1 = gosummaries(gl1)
# gs2 = gosummaries(gl2)
# 
# plot(gs1, fontsize = 8)
# plot(gs2, fontsize = 8)
# 
# # Changing slot contents using using addToSlot.gosummaries 
# gs1 = add_to_slot.gosummaries(gs1, "Title", list("Neurons", "Cell lines"))
# 
# # Adding expression data
# data(tissue_example)
# 
# gs1 = add_expression.gosummaries(gs1, exp = tissue_example$exp, annotation = 
# tissue_example$annot)
# gs2 = add_expression.gosummaries(gs2, exp = tissue_example$exp, annotation = 
# tissue_example$annot)
# 
# plot(gs1, panel_par = list(classes = "Tissue"), fontsize = 8)
# plot(gs2, panel_par = list(classes = "Tissue"), fontsize = 8)
# ## End(Not run)

# Using custom annotations for word clouds
wcd1 = data.frame(Term = c("KLF1", "KLF2", "POU5F1"), Score = c(0.05, 0.001, 
0.0001))
wcd2 = data.frame(Term = c("CD8", "CD248", "CCL5"), Score = c(0.02, 0.005, 
0.00001))

gs = gosummaries(wc_data = list(Results1 = wcd1, Results2 = wcd2))
plot(gs)

gs = gosummaries(wc_data = list(Results = list(wcd1, wcd2)))
plot(gs)

# Adjust wordcloud legend title
gs = gosummaries(wc_data = list(Results = list(wcd1, wcd2)), 
wordcloud_legend_title = "Significance score")
plot(gs)

Run the code above in your browser using DataLab