choose_clusters: Cluster Pathways and Partition the Dendrogram

Description

This function first calculates the pairwise distances between the pathways in the result_df data frame. Next, using this distance matrix, the pathways are clustered via hierarchical clustering. By default, the average silhouette width for each possible number of clusters is calculated. The optimal number of clusters is selected as the one with the highest average silhouette width. The dendrogram is cut into this optimal number of clusters, and the pathways with the lowest p value within each cluster are chosen as representative pathways. If 'auto == FALSE", the user can manually select at which height to cut the dendrogram via a shiny application. See "Chen, Y. A. et al. Integrated pathway clusters with coherent biological themes for target prioritisation. PLoS One 9, e99030, doi:10.1371/journal.pone.0099030 (2014)." for details on the method of pathway clustering.

Usage

choose_clusters(result_df, p_val_threshold = 0.05, auto = TRUE,
  agg_method = "average", plot_heatmap = FALSE, plot_dend = FALSE,
  use_names = FALSE, custom_genes = NULL)

Arguments

result_df

data frame of enriched pathways. Must-have columns are:

IDKEGG ID of the enriched pathway
lowest_pthe lowest adjusted-p value of the given pathway over all iterations
highest_pthe highest adjusted-p value of the given pathway over all iterations

p_val_threshold

p value threshold for filtering the pathways prior to clustering (default: 0.05)

auto

boolean value indicating whether to select the optimal number of clusters automatically. If FALSE, a shiny application is displayed, where the user can manually partition the clustering dendrogram (default: TRUE).

agg_method

the agglomeration method to be used if plotting heatmap. Must be one of "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median" or "centroid" (default: "average").

plot_heatmap

boolean value indicating whether or not to plot the heat map of pathway clustering (default: FALSE).

plot_dend

boolean value indicating whether or not to plot the dendrogram partitioned into the optimal number of clusters, shown by red rectangles (default: FALSE)

use_names

boolean value indicating whether to use gene set names instead of gene set ids (default: FALSE)

custom_genes

a list containing the genes involved in each custom pathway. Each element is a vector of gene symbols located in the given pathway. Names correspond to the ID of the pathway. Must be provided if `result_df` was generated using custom gene sets.

Value

If 'auto' is FALSE, manual partitioning can be performed. Via a shiny HTML document, the hierarchical clustering dendrogram is visualized. In this HTML document, the user can select the agglomeration method and the distance value at which to cut the tree. The resulting cluster assignments of the pathways along with annotation of representative pathways (chosen by smallest lowest p value) are presented as a table and this table can be saved as a csv file. If 'auto' is TRUE, automatic partitioning of clusters is performed. The function adds 2 additional columns to the input data frame and returns it:

Cluster: the cluster to which the pathway is assigned
Status: whether the pathway is the "Representative" pathway in its cluster or only a "Member"

Examples

Run this code

# NOT RUN {
## Cluster pathways with p <= 0.01
choose_clusters(RA_output, p_val_threshold = 0.01)
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples