Learn R Programming

multiClust (version 1.0.2)

A collection of gene feature selection and clustering analysis algorithms

Description

Whole transcriptomic profiles are useful for studying the expression levels of thousands of genes across samples. Clustering algorithms are used to identify patterns in these profiles to determine clinically relevant subgroups. Feature selection is a critical integral part of the process. Currently, there are many feature selection and clustering methods to identify the relevant genes and perform clustering of samples. However, choosing the appropriate methods is difficult as recent work demonstrates that no method is the clear winner. Hence, we present an R-package called `multiClust` that allows researchers to experiment with the choice of combination of methods for gene selection and clustering with ease. In addition, using multiClust, we present the merit of gene selection and clustering methods in the context of clinical relevance of clustering, specifically clinical outcome. Our integrative R- package contains: 1. A function to read in gene expression data and format appropriately for analysis in R. 2. Four different ways to select the number of genes a. Fixed b. Percent c. Poly d. GMM 3. Four gene ranking options that order genes based on different statistical criteria a. CV_Rank b. CV_Guided c. SD_Rank d. Poly 4. Two ways to determine the cluster number a. Fixed b. Gap Statistic 5. Two clustering algorithms a. Hierarchical clustering b. K-means clustering 6. A function to calculate average gene expression in each sample cluster 7. A function to correlate sample clusters with clinical outcome Order of Function use: 1. input_file, a function to read-in the gene expression file and assign gene probe names as the rownames. 2. number_probes, a function to determine the number of probes to select for in the gene feature selection process. 3. probe_ranking, a function to select for gene probes using one of the available gene probe ranking options. 4. number_clusters, a function to determine the number of clusters to be used to cluster genes and samples. 5. cluster_analysis, a function to perform Kmeans or Hierarchical clustering analysis of the selected gene expression data. 6. avg_probe_exp, a function to produce a matrix containing the average expression of each gene probe within each sample cluster. 7. surv_analysis, a function to produce Kaplan-Meier Survival Plots of selected gene expression data.

Copy Link

Version

Version

1.0.2

License

GPL (>= 2)

Maintainer

Nathan Lawlor

Last Published

February 15th, 2017

Functions in multiClust (1.0.2)

probe_ranking

Function to select for genes using one of the available gene probe ranking options.
input_file

Function to read-in the gene expression file and assign gene probe names as the rownames.
nor.min.max

Function to normalize data to bring values into alignment. This function uses feature scaling to normalize values in a dataset between 0 and 1.
number_clusters

Function to determine the number of clusters to be used to cluster gene probes and samples.
number_probes

Function to determine the number of gene probes to select for in the gene feature selection process.
WriteMatrixToFile

Function to write a data matrix to a text file.
avg_probe_exp

Function to produce a matrix containing the average expression of each gene probe within each sample cluster.
cluster_analysis

Function to perform Kmeans or Hierarchical clustering analysis of the selected gene probe expression data.
surv_analysis

Function to produce Kaplan-Meier Survival Plots of selected gene expression data.