This function first calculates the pairwise distances between the
pathways in the result_df
data frame. Next, using this distance
matrix, the pathways are clustered via hierarchical clustering. By default,
the average silhouette width for each possible number of clusters is
calculated. The optimal number of clusters is selected as the one with the
highest average silhouette width. The dendrogram is cut into this optimal
number of clusters, and the pathways with the lowest p value within each
cluster are chosen as representative pathways. If 'auto == FALSE", the user
can manually select at which height to cut the dendrogram via a shiny application.
See "Chen, Y. A. et al. Integrated pathway clusters with coherent biological
themes for target prioritisation. PLoS One 9, e99030,
doi:10.1371/journal.pone.0099030 (2014)." for details on the method of
pathway clustering.
choose_clusters(result_df, p_val_threshold = 0.05, auto = TRUE,
agg_method = "average", plot_heatmap = FALSE, plot_dend = FALSE,
use_names = FALSE, custom_genes = NULL)
data frame of enriched pathways. Must-have columns are:
IDKEGG ID of the enriched pathway
lowest_pthe lowest adjusted-p value of the given pathway over all iterations
highest_pthe highest adjusted-p value of the given pathway over all iterations
p value threshold for filtering the pathways prior to clustering (default: 0.05)
boolean value indicating whether to select the optimal number of clusters automatically. If FALSE, a shiny application is displayed, where the user can manually partition the clustering dendrogram (default: TRUE).
the agglomeration method to be used if plotting heatmap. Must be one of "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median" or "centroid" (default: "average").
boolean value indicating whether or not to plot the heat map of pathway clustering (default: FALSE).
boolean value indicating whether or not to plot the dendrogram partitioned into the optimal number of clusters, shown by red rectangles (default: FALSE)
boolean value indicating whether to use gene set names instead of gene set ids (default: FALSE)
a list containing the genes involved in each custom pathway. Each element is a vector of gene symbols located in the given pathway. Names correspond to the ID of the pathway. Must be provided if `result_df` was generated using custom gene sets.
If 'auto' is FALSE, manual partitioning can be performed. Via a shiny HTML document, the hierarchical clustering dendrogram is visualized. In this HTML document, the user can select the agglomeration method and the distance value at which to cut the tree. The resulting cluster assignments of the pathways along with annotation of representative pathways (chosen by smallest lowest p value) are presented as a table and this table can be saved as a csv file. If 'auto' is TRUE, automatic partitioning of clusters is performed. The function adds 2 additional columns to the input data frame and returns it:
the cluster to which the pathway is assigned
whether the pathway is the "Representative" pathway in its cluster or only a "Member"
See calculate_pwd
for calculation of pairwise
distances between enriched pathways. See hclust
for more information on hierarchical clustering. See run_pathfindR
for the wrapper function of the pathfindR enrichment workflow.
# NOT RUN {
## Cluster pathways with p <= 0.01
choose_clusters(RA_output, p_val_threshold = 0.01)
# }
Run the code above in your browser using DataLab