Learn R Programming

The NCutYX Package

Description

The NCutYX package includes functions for clustering genomic data using graph theory. Each function in this package is a variation on the NCut measure used to cluster vertices in a graph. The running theme is to use data sets from different sources and types to improve the clustering results.

  • The ncut function clusters the columns of a data set using the classical normalized cut measure from graph theory.
  • The ancut function clusters one type of data, say gene expressions, with the help of a second type of data, like copy number aberrations.
  • The muncut function clusters a three-layered graph into K different clusters of 3 different data types, say gene expression, copy number aberrations and proteins.
  • The pwncut function clusters the columns of X into K clusters by giving a weight for each cluster while penalizing them to be similar to each other.
  • The mlbncut function works similarly to muncut but it also clusters samples into R clusters.
  • The awncut builds similarity matrices for the row of X and an assisted dataset Z. Clusters them into K groups while conducting feature selection based on the AWNCut method.

To install:

  • latest development version:
    1. install and load package devtools
    2. install_github("Seborinos/NCutYX")

NCut

The Normalized Cut (NCut) clusters the columns of Y into K groups using the NCut graph measure. Builds a similarity matrix for the columns of Y and clusters them into K groups based on the NCut graph measure. Correlation, Euclidean and Gaussian distances can be used to construct the similarity matrix. The NCut measure is minimized using the cross entropy method, a Monte Carlo optimization technique.

ANCut

The Assisted NCut (ANcut) clusters the columns of a data set Y into K groups with the help of an external data set X, which is associated linearly with Y.

References:

MuNCut

This example shows how to use the muncut function. MuNCut clusters the columns of data from 3 different sources. It clusters the columns of Z, Y and X into K clusters by representing each data type as one network layer. It represents the Z layer depending on Y, and the Y layer depending on X. Elastic net can be used before the clustering procedure by using the predictions of Z and Y instead of the actual values to improve the cluster results. The function muncut will output K clusters of columns of Z, Y and X.

References:

  • Sebastian J. Teran Hidalgo and Shuangge Ma. “Clustering Multilayer Omics Data using MuNCut.” Revise and resubmit.

PWNCut

The Penalized Weighted NCut (PWNCut) clusters the columns of X into K clusters by giving a weighted cluster membership while shrinking weights towards each other.

References:

  • Sebastian J. Teran Hidalgo, Mengyun Wu and Shuangge Ma. “Penalized and weighted clustering of gene expression data using PWNCut.” Submitted.

MLBNCut

The Multilayer Biclustering NCut (MLBNCut) clusters the columns and the rows simultaneously of data from 3 different sources. It clusters the columns of Z,Y and X into K clusters and the samples into R clusters by representing each data type as one network layer. It represents the Z layer depending on Y, and the Y layer depending on X. This function will output K clusters of columns of Z, Y and X and R clusters of the samples.

References:

  • Sebastian J. Teran Hidalgo and Shuangge Ma. “Multilayer Biclustering of Omics Data using MLBNCut.” Work in progress.

AWNCut

The Assisted Weighted NCut builds the similarity matrices for the rows of X and an assisted dataset Z. Clusters them into K groups while conducting feature selection based on the AWNCut method.

References:

  • Li, Yang; Bie, Ruofan; Teran Hidalgo, Sebastian; Qin, Yinchen; Wu, Mengyun; Ma, Shuangge. “Assisted gene expression-based clustering with AWNCut.” Submitted.

Copy Link

Version

Install

install.packages('NCutYX')

Monthly Downloads

140

Version

0.1.0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Sebastian Teran Hidalgo

Last Published

February 9th, 2018

Functions in NCutYX (0.1.0)

awncut

Cluster the Rows of X into K Clusters Using the AWNCut Method.
cesc.data.ge

Data on gene expression from breast cancer patients.
cesc.data.rppa

Data on protein measurements from cervical cancer patients.
mlbncut

The MLBNCut Clusters the Columns and the Rows Simultaneously of Data from 3 Different Sources.
muncut

MuNCut Clusters the Columns of Data from 3 Different Sources.
ErrorRate

This Function Calculates the True Error Rate of a Clustering Result, Assuming that There are Three Clusters.
ancut

Cluster the Columns of Y into K Groups with the Help of External Features X.
ncut

Cluster the Columns of Y into K Groups Using the NCut Graph Measure.
pwncut

Cluster the Columns of X into K Clusters by Giving a Weighted Cluster Membership while shrinking Weights Towards Each Other.
brca.data.cna

Data on copy number aberrations from breast cancer patients.
brca.data.ge

Data on gene expression from breast cancer patients.
brca.data.rppa

Data on protein measurements from breast cancer patients.
cesc.data.cna

Data on copy number aberrations from cervical cancer patients.
awncut.selection

This Function Outputs the Selection of Tuning Parameters for the AWNCut Method.